Saturday, December 29, 2012

RFCs and Conflicts


This post is mostly a collection of notes I had made while reading "The Tangled Web" -- it tried to list out conflicting RFCs and implementations and obscure RFCs which fail to specify certain edge cases.

Please note that a lot of the snippets have been taken "as is" from the book TTW -- I'm just trying to consolidate it for easier future reference. If there is any mistake in what you read below its probably because I screwed up while noting stuff down.

The RFC specification is in RED
RFC improvements(if any) are in light PURPLE
Browser implementation conflicts are in GREEN
If there is a personal note that I'd like to add, its colored BLUE.

RFC 1738 - Uniform Resource Locators (URL) - page 25, TTW
Before Browsers and Web applications parse URLs, they need to be able to distinguish absolute URLs from relative ones. A valid scheme is meant to be a key difference. In a compliant absolute URL, only the alphanumerics '+', '-', '.' may appear before the required ':'.
All browsers ignore the leading newlines, whitespaces. IE ignores 0x01-0x1F. Chrome skips 0x00 and the NUL character. Most browsers allow newlines, tabs in the middle of scheme names. Opera accepts high-bit characters in the string.

RFC 1738 - Uniform Resource Locators (URL) - page 25, TTW
Every absolute hierarchical URL is required to contain "//" right before the authority section.
The RFC however does not mention how the presence of a "//" in a non-hierarchical URL is to be dealt with.
RFC 3968 acknowledges this flaw and permits implementations to try and parse such URLs for compatibility reasons
The address "http:example.com/" is treated as "http://example.com". "javascript://example.com/%0Aalert(1)" is interpreted as a valid non-hierarchical pseudo-URL in all modern browsers and alert(1) will execute. "mailto://user@example.com". IE accepts this URL as valid non-hierarchical reference to an email address, other browsers disagree
"mailto://blah@asdf.com" in google chrome opens up a google search with that value. "mailto://blah@gmail.com" and "mailto://blah@hotmail.com" opens up the mail client

RFC 1738 - Uniform Resource Locators (URL) - page 26, TTW
RFC permits only canonical notations for IP addresses. eg(http://127.0.0.1)
As the C libraries used by applications are more relaxed "http://127.0.0.1/", "http://0x7f.1/" and "http://017700000001/" might be considered equivalent.

RFC 1035 - Domain Names - Implementation and Specification - page 27, TTW
DNS labels need to conform to a very narrow set(alphanumerics, '.', '-').
Most browsers will ask the underlying operating system to look up almost any DNS name.

No comments:

Post a Comment