[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
url-expand.el and url-parse.el not conforming to RFC3986
From: |
Alain Schneble (Realize IT GmbH) |
Subject: |
url-expand.el and url-parse.el not conforming to RFC3986 |
Date: |
Fri, 27 Nov 2015 15:22:29 +0000 |
Hello
url-expand.el and url-parse.el seem to not follow RFC3986 "Uniform
Resource Identifier (URI): Generic Syntax" in some cases. But I guess
they should. So I started to study RFC3986 in more details and write
tests against url-expand-file-name and url-generic-parse-url (see
attached patch).
The tests reveal the following issues:
1. resolving relative "fragment-only" URIs against a given absolute
base URI (see RFC3986, section 5. Reference Resolution, and
especially 5.2.2. Transform References):
(url-expand-file-name "#s" "http://a/b/c/d;p?q")
=> "#s" but should be http://a/b/c/d;p?q#s"
(url-expand-file-name "#bar" "http://host")
=> "#bar" but should be "http://host#bar"
(url-expand-file-name "#bar" "http://host/")
=> "#bar" but should be "http://host/#bar"
(url-expand-file-name "#bar" "http://host/foo")
=> "#bar" but should be "http://host/foo#bar"
2. resolving relative "query-only" URIs against a given absolute base
URI (see RFC3986, same sections as mentioned in point 1.):
(url-expand-file-name "?y" "http://a/b/c/d;p?q")
=> "http://a/b/c/?y" but should be "http://a/b/c/d;p?y"
(url-expand-file-name "?y" "http://a/b/c/d")
=> "http://a/b/c/?y" but should be "http://a/b/c/d?y")
3. removing dot segments (see RFC3986, section 5.2.4. Remove Dot
Segments):
(url-expand-file-name "/./g" "http://a/b/c/d;p?q")
=> "http://a/./g" but should be "http://a/g"
(url-expand-file-name "/../g" "http://a/b/c/d;p?q")
=> "http://a/../g" but should be "http://a/g"
4. empty fragment information is lost after parsing URI:
(equal (url-generic-parse-url "#")
(url-parse-make-urlobj nil nil nil nil nil "" "" nil nil))
^
=> nil but should be t (fragment component is actually nil instead
of an empty string)
Same issue with URLs having a number sign (#) as suffix:
"/foo/bar#"
"/foo/bar/#"
"http://host#"
"http://host?#"
"http://host?query#"
"http://host/#"
"http://host/?#"
"http://host/?query#"
"http://host/foo#"
"http://host/foo?#"
"http://host/foo?query#"
... and so forth
The problem with this is that the inverse function url-recreate-url
won't be able to reconstruct exactly the same URI. For example:
(url-recreate-url (url-generic-parse-url "#"))
=> "" but should be "#"
To address these issues, I propose changes to url-parse.el and
url-expand.el, see attached patch. Here is the detailed summary:
- url-parse-tests.el: add tests for url-generic-parse-url
- url-expand-tests.el: add tests for url-expand-file-name
- url-generic-parse-url: keep empty fragment information in URL-struct
- url-path-and-query: do not artificially turn empty path and query
into nil path and query, respectively
- url-expander-remove-relative-links: do not turn empty path into an
absolute path ("/"). Remark: due to the name of this function, would
it be better to fix this case where this function is called?
- url-expand-file-name: properly resolve fragment-only URIs. Do not
just return them unchanged. I think that this bug was due to a
misinterpretation of RFC3986, section 5.1. Establishing a Base URI:
"Aside from fragment-only references (Section 4.4), relative
references are only usable when a base URI is known."
To me, this does not mean that they should not be resolved
properly. And the expamples given in the RFC emphasize this as well.
- url-default-expander: an empty path in the relative reference URI
should not drop the last segment.
Please let me know if I should follow a different procedure to submit
these changes. I signed the copyright assignment "GNU EMACS" this year.
Thanks,
Alain
0001-Make-relative-URL-parsing-and-resolution-consistent-.patch
Description: 0001-Make-relative-URL-parsing-and-resolution-consistent-.patch
- url-expand.el and url-parse.el not conforming to RFC3986,
Alain Schneble (Realize IT GmbH) <=