Saturday, December 29, 2012

RFCs and Conflicts

This post is mostly a collection of notes I had made while reading "The Tangled Web" -- it tried to list out conflicting RFCs and implementations and obscure RFCs which fail to specify certain edge cases.

Please note that a lot of the snippets have been taken "as is" from the book TTW -- I'm just trying to consolidate it for easier future reference. If there is any mistake in what you read below its probably because I screwed up while noting stuff down.

The RFC specification is in RED
RFC improvements(if any) are in light PURPLE
Browser implementation conflicts are in GREEN
If there is a personal note that I'd like to add, its colored BLUE.

RFC 1738 - Uniform Resource Locators (URL) - page 25, TTW
Before Browsers and Web applications parse URLs, they need to be able to distinguish absolute URLs from relative ones. A valid scheme is meant to be a key difference. In a compliant absolute URL, only the alphanumerics '+', '-', '.' may appear before the required ':'.
All browsers ignore the leading newlines, whitespaces. IE ignores 0x01-0x1F. Chrome skips 0x00 and the NUL character. Most browsers allow newlines, tabs in the middle of scheme names. Opera accepts high-bit characters in the string.

RFC 1738 - Uniform Resource Locators (URL) - page 25, TTW
Every absolute hierarchical URL is required to contain "//" right before the authority section.
The RFC however does not mention how the presence of a "//" in a non-hierarchical URL is to be dealt with.
RFC 3968 acknowledges this flaw and permits implementations to try and parse such URLs for compatibility reasons
The address "" is treated as "". "javascript://" is interpreted as a valid non-hierarchical pseudo-URL in all modern browsers and alert(1) will execute. "mailto://". IE accepts this URL as valid non-hierarchical reference to an email address, other browsers disagree
"mailto://" in google chrome opens up a google search with that value. "mailto://" and "mailto://" opens up the mail client

RFC 1738 - Uniform Resource Locators (URL) - page 26, TTW
RFC permits only canonical notations for IP addresses. eg(
As the C libraries used by applications are more relaxed "", "http://0x7f.1/" and "http://017700000001/" might be considered equivalent.

RFC 1035 - Domain Names - Implementation and Specification - page 27, TTW
DNS labels need to conform to a very narrow set(alphanumerics, '.', '-').
Most browsers will ask the underlying operating system to look up almost any DNS name.