guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 39


From: Vivien Kraus
Subject: Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Mon, 02 Oct 2023 18:32:50 +0200
User-agent: Evolution 3.46.4

Hi!

Are there other things to fix?

Best regards,

Vivien

Le lundi 25 septembre 2023 à 18:48 +0200, Vivien Kraus a écrit :
> * module/web/uri.scm (remove-dot-segments): Implement algorithm
> 5.2.4.
> (merge-paths): Implement algorithm 5.2.3.
> (resolve-relative-reference): Implement algorithm 5.2.2.
> (module): Export resolve-relative-reference.
> * NEWS: Reference it here.
> * doc/ref/web.texi (URIs): Document it here.
> (Subtypes of URI): Add a @node declaration to cross-reference it.
> (HTTP Headers) [location]: Point to the section for different URI
> types.
> (Web Client) [http-request]: Indicate that no redirection is
> performed.
> ---
> 
> I clarified the situation about redirections. I don’t think it’s
> Guile’s
> job to do it. For permanent redirections (301), the application
> developer is supposed to edit the pages that point to the now-moved
> resource anyway. A handful of security issues must also be lurking in
> the shadows, and I don’t think it should be a responsibility for the
> Guile web client.
> 
> The specification uses the word "relative" both for the type of URI
> that
> is most likely to be found, and to express the asymmetric relation
> between both arguments of the algorithm. I think "base" and
> "dependent"
> are clearer, what do you think?
> 
> The semicolon and equal sign are both reserved characters, so it’s
> expected that Guile escapes them. If there’s a bug, it is in the 5.4
> section of the RFC. However, I understand that it would be desirable
> for
> the algorithm to accept such unescaped characters, since it works
> with
> URIs in isolation and not in an HTTP frame or web page.
> 
>  NEWS                          |   7 ++
>  doc/ref/web.texi              |  27 +++++-
>  module/web/uri.scm            | 161
> +++++++++++++++++++++++++++++++++-
>  test-suite/tests/web-uri.test |  68 ++++++++++++++
>  4 files changed, 261 insertions(+), 2 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index b319404d7..bdf75cb3c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
>  
>  * New interfaces and functionality
>  
> +** New function in (web uri): resolve-relative-reference
> +
> +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It
> may
> +be used to request a moved resource in case of a 301 or 302 HTTP
> +response, by resolving the Location value of the response on top of
> the
> +requested URI.
> +
>  ** New warning: unused-module
>  
>  This analysis, enabled at `-W2', issues warnings for modules that
> appear
> diff --git a/doc/ref/web.texi b/doc/ref/web.texi
> index 607c855b6..2267c9774 100644
> --- a/doc/ref/web.texi
> +++ b/doc/ref/web.texi
> @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs"
> "biscuits&gravy")} encodes
>  as @code{"scrambled%20eggs/biscuits%26gravy"}.
>  @end deffn
>  
> +@node Subtypes of URI
>  @subsubheading Subtypes of URI
>  
>  As we noted above, not all URI objects have a scheme.  You might
> have
> @@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while
> asserting that no scheme is
>  present.  Return @code{#f} if the string could not be parsed.
>  @end deffn
>  
> +@cindex resolve URI reference
> +In order to get a URI object from a base URI and a relative
> reference,
> +one has to use a @dfn{relative URI reference resolution} algorithm. 
> For
> +instance, given a base URI, @samp{https://example.com/over/here},
> and a
> +relative reference, @samp{../no/there}, it may seem easy to get an
> +absolute URI as @samp{https://example.com/over/../no/there}. It is
> +possible that the server at @samp{https://example.com} could serve
> the
> +same resource under this URL as
> +@samp{https://example.com/no/there}. However, a web cache, or a
> linked
> +data processor, must understand that the relative reference
> resolution
> +leads to @samp{https://example.com/no/there}.
> +
> +@deffn {Scheme procedure} resolve-relative-reference @var{base}
> @var{dependent}
> +Return a URI object representing @var{dependent}, using the
> components
> +of @var{base} if missing, as defined in section 5.2 in RFC 3986.
> This
> +function cannot return a relative reference (it can only return an
> +absolute URI object), if either @var{base} or @var{dependent} is an
> +absolute URI object.
> +@end deffn
>  
>  @node HTTP
>  @subsection The Hyper-Text Transfer Protocol
> @@ -1038,7 +1058,8 @@ The entity-tag of the resource.
>  @deftypevr {HTTP Header} URI-reference location
>  A URI reference on which a request may be completed.  Used in
>  combination with a redirecting status code to perform client-side
> -redirection.
> +redirection. @xref{Subtypes of URI, the distinction between types of
> +URI}, for more information on relative references.
>  @example
>  (parse-header 'location "http://example.com/other";)
>  @result{} #<uri ...>
> @@ -1501,6 +1522,10 @@ constants, such as @code{certificate-
> status/signer-not-found} or
>  Connect to the server corresponding to @var{uri} and make a request
> over
>  HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST},
> etc.).
>  
> +@code{http-request} does not follow redirections. If a redirection
> is
> +required, @code{http-request} returns a response object with an
> adequate
> +response code (e.g. 301 or 302).
> +
>  The following keyword arguments allow you to modify the requests in
>  various ways, for example attaching a body to the request, or
> setting
>  specific headers.  The following table lists the keyword arguments
> and
> diff --git a/module/web/uri.scm b/module/web/uri.scm
> index 8e0b9bee7..acec2d1e8 100644
> --- a/module/web/uri.scm
> +++ b/module/web/uri.scm
> @@ -47,7 +47,9 @@
>  
>              uri-reference? relative-ref?
>              build-uri-reference build-relative-ref
> -            string->uri-reference string->relative-ref))
> +            string->uri-reference string->relative-ref
> +
> +            resolve-relative-reference))
>  
>  (define-record-type <uri>
>    (make-uri scheme userinfo host port path query fragment)
> @@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’
> as a delimiter.
>  For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
>  encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
>    (string-join (map uri-encode parts) "/"))
> +
> +(define (remove-dot-segments path)
> +  "Remove the @samp{./} and @samp{../} segments in @var{path}, as
> + RFC3986, section 5.2.4."
> +  (let scan ((input
> +              (let ((components (split-and-decode-uri-path path)))
> +                (if (string-suffix? "/" path)
> +                    `(,@components "")
> +                    components)))
> +             (input-path-absolute? (string-prefix? "/" path))
> +             (output '())
> +             (output-absolute? #f)
> +             (output-ends-in-/? (string-suffix? "/" path)))
> +    (cond
> +     ((and input-path-absolute?
> +           (null? input))
> +      ;; Transfer the initial "/" from the input to the end of the
> +      ;; output.
> +      (scan '() #f output output-absolute? #t))
> +     ((null? input)
> +      (string-append
> +       (if output-absolute? "/" "")
> +       (encode-and-join-uri-path
> +        (reverse output))
> +       (if output-ends-in-/? "/" "")))
> +     ((and (not input-path-absolute?)
> +           (or (equal? (car input) "..")
> +               (equal? (car input) ".")))
> +      (scan (cdr input) #f output output-absolute? output-ends-in-
> /?))
> +     ((and input-path-absolute?
> +           (equal? (car input) "."))
> +      (scan (cdr input) #t output output-absolute? output-ends-in-
> /?))
> +     ((and input-path-absolute?
> +           (equal? (car input) ".."))
> +      (scan (cdr input) #t
> +            (if (null? output)
> +                output
> +                (cdr output))
> +            ;; Remove the last segment, including the preceding /.
> So,
> +            ;; if there is 0 or 1 segment, remove the root / too.
> +            (if (or (null? output) (null? (cdr output)))
> +                #f  ;; remove the /
> +                #t) ;; keep it
> +            #f))
> +     (else
> +      (scan (cdr input)
> +            ;; If there is only 1 item in input, then it does not
> end in
> +            ;; /, so the recursive call does not start with
> +            ;; /. Otherwise, the recursive call starts with /.
> +            (not (null? (cdr input)))
> +            (cons (car input) output)
> +            ;; If the output is empty and the input path is
> absolute,
> +            ;; the / of the transferred path is transferred as well.
> +            (or output-absolute?
> +                (and (null? output)
> +                     input-path-absolute?))
> +            #f)))))
> +
> +(define (merge-paths base-has-authority? base dependent)
> +  "Return @samp{@var{base}/@var{dependent}}, with the subtelties of
> absolute
> + paths explained in RFC3986, section 5.2.3. If the base URI has an
> +authority (userinfo, host, port), then the processing is a bit
> +different."
> +  (if (and base-has-authority?
> +           (equal? base ""))
> +      (string-append "/" dependent)
> +      (let ((last-/ (string-rindex base #\/)))
> +        (if last-/
> +            (string-append (substring base 0 last-/) "/" dependent)
> +            dependent))))
> +
> +(define (resolve-relative-reference base dependent)
> +  "Resolve @var{dependent} on top of @var{base}, as RFC3986, section
> +5.2. Both @var{dependent} and @var{base} may be URI or relative
> +references. The return value is a URI if either @var{dependent} or
> +@var{base} is a URI."
> +  ;; As opposed to RFC 3986, we use "dependent" instead of
> "relative" to
> +  ;; avoid confusion between "URI" and "relative reference", the
> +  ;; dependent URI may be either.
> +  (let ((b-scheme (uri-scheme base))
> +        (b-userinfo (uri-userinfo base))
> +        (b-host (uri-host base))
> +        (b-port (uri-port base))
> +        (b-path (uri-path base))
> +        (b-query (uri-query base))
> +        (b-fragment (uri-fragment base))
> +        (r-scheme (uri-scheme dependent))
> +        (r-userinfo (uri-userinfo dependent))
> +        (r-host (uri-host dependent))
> +        (r-port (uri-port dependent))
> +        (r-path (uri-path dependent))
> +        (r-query (uri-query dependent))
> +        (r-fragment (uri-fragment dependent))
> +        (t-scheme #f)
> +        (t-userinfo #f)
> +        (t-host #f)
> +        (t-port #f)
> +        (t-path "")
> +        (t-query #f)
> +        (t-fragment #f))
> +    ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
> +
> +    ;;The programming style uses mutations to better adhere to the
> +    ;;algorithm specification.
> +    (if r-scheme
> +        (begin
> +          (set! t-scheme r-scheme)
> +          (set! t-userinfo r-userinfo)
> +          (set! t-host r-host)
> +          (set! t-port r-port)
> +          (set! t-path (remove-dot-segments r-path))
> +          (set! t-query r-query))
> +        ;; r-scheme is not defined:
> +        (begin
> +          (if r-host
> +              (begin
> +                (set! t-userinfo r-userinfo)
> +                (set! t-host r-host)
> +                (set! t-port r-port)
> +                (set! t-path (remove-dot-segments r-path))
> +                (set! t-query r-query))
> +              ;; r-scheme is not defined, r-authority is not
> defined:
> +              (begin
> +                (if (equal? r-path "")
> +                    (begin
> +                      (set! t-path b-path)
> +                      (if r-query
> +                          ;; r-scheme, r-authority, r-path are not
> +                          ;; defined:
> +                          (set! t-query r-query)
> +                          ;; r-scheme, r-authority, r-path, r-query
> are
> +                          ;; not defined:
> +                          (set! t-query b-query)))
> +                    ;; r-scheme, r-authority not defined, r-path
> defined:
> +                    (begin
> +                      (if (string-prefix? "/" r-path)
> +                          ;; r-scheme, r-authority not defined, r-
> path
> +                          ;; absolute:
> +                          (set! t-path (remove-dot-segments r-path))
> +                          ;; r-scheme, r-authority not defined, r-
> path
> +                          ;; dependent:
> +                          (set! t-path
> +                                (remove-dot-segments
> +                                 (merge-paths b-host b-path r-
> path))))
> +                      (set! t-query r-query)))
> +                (set! t-userinfo b-userinfo)
> +                (set! t-host b-host)
> +                (set! t-port b-port)))
> +          (set! t-scheme b-scheme)))
> +    (set! t-fragment r-fragment)
> +    (build-uri-reference #:scheme t-scheme
> +                         #:userinfo t-userinfo
> +                         #:host t-host
> +                         #:port t-port
> +                         #:path t-path
> +                         #:query t-query
> +                         #:fragment t-fragment)))
> diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-
> uri.test
> index 95fd82f16..c453bf60f 100644
> --- a/test-suite/tests/web-uri.test
> +++ b/test-suite/tests/web-uri.test
> @@ -20,6 +20,7 @@
>  (define-module (test-web-uri)
>    #:use-module (web uri)
>    #:use-module (ice-9 regex)
> +  #:use-module (ice-9 string-fun)
>    #:use-module (test-suite lib))
>  
>  
> @@ -693,3 +694,70 @@
>    (pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
>    (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
>    (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
> +
> +(with-test-prefix "resolve relative reference"
> +  ;; Test suite in RFC3986, section 5.4.
> +  (let ((base (string->uri "http://a/b/c/d;p?q";))
> +        (equal/encoded?
> +         ;; The test suite checks for ';' characters, but Guile
> escapes
> +         ;; them in URIs. Same for '='.
> +         (let ((escape-colon
> +                (lambda (x)
> +                  (string-replace-substring x ";" "%3B")))
> +               (escape-equal
> +                (lambda (x)
> +                  (string-replace-substring x "=" "%3D"))))
> +         (lambda (x y)
> +           (equal? (escape-colon (escape-equal x))
> +                   (escape-colon (escape-equal y)))))))
> +    (let ((resolve
> +           (lambda (relative)
> +             (let* ((relative-uri
> +                     (string->uri-reference relative))
> +                    (resolved-uri
> +                     (resolve-relative-reference base relative-uri))
> +                    (resolved (uri->string resolved-uri)))
> +               resolved))))
> +      (with-test-prefix "normal"
> +        (pass-if (equal/encoded? (resolve "g:h") "g:h"))
> +        (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g";))
> +        (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g";))
> +        (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/";))
> +        (pass-if (equal/encoded? (resolve "/g") "http://a/g";))
> +        (pass-if (equal/encoded? (resolve "//g") "http://g";))
> +        (pass-if (equal/encoded? (resolve "?y")
> "http://a/b/c/d;p?y";))
> +        (pass-if (equal/encoded? (resolve "g?y")
> "http://a/b/c/g?y";))
> +        (pass-if (equal/encoded? (resolve "#s")
> "http://a/b/c/d;p?q#s";))
> +        (pass-if (equal/encoded? (resolve "g?y#s")
> "http://a/b/c/g?y#s";))
> +        (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x";))
> +        (pass-if (equal/encoded? (resolve "g;x?y#s")
> "http://a/b/c/g;x?y#s";))
> +        (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q";))
> +        (pass-if (equal/encoded? (resolve ".") "http://a/b/c/";))
> +        (pass-if (equal/encoded? (resolve "./") "http://a/b/c/";))
> +        (pass-if (equal/encoded? (resolve "..") "http://a/b/";))
> +        (pass-if (equal/encoded? (resolve "../") "http://a/b/";))
> +        (pass-if (equal/encoded? (resolve "../g") "http://a/b/g";))
> +        (pass-if (equal/encoded? (resolve "../..") "http://a/";))
> +        (pass-if (equal/encoded? (resolve "../../") "http://a/";))
> +        (pass-if (equal/encoded? (resolve "../../g") "http://a/g";)))
> +      (with-test-prefix "abnormal"
> +        (pass-if (equal/encoded? (resolve "../../../g")
> "http://a/g";))
> +        (pass-if (equal/encoded? (resolve "../../../../g")
> "http://a/g";))
> +        (pass-if (equal/encoded? (resolve "/./g") "http://a/g";))
> +        (pass-if (equal/encoded? (resolve "/../g") "http://a/g";))
> +        (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g.";))
> +        (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g";))
> +        (pass-if (equal/encoded? (resolve "g..")
> "http://a/b/c/g..";))
> +        (pass-if (equal/encoded? (resolve "..g")
> "http://a/b/c/..g";))
> +        (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g";))
> +        (pass-if (equal/encoded? (resolve "./g/.")
> "http://a/b/c/g/";))
> +        (pass-if (equal/encoded? (resolve "g/./h")
> "http://a/b/c/g/h";))
> +        (pass-if (equal/encoded? (resolve "g/../h")
> "http://a/b/c/h";))
> +        (pass-if (equal/encoded? (resolve "g;x=1/./y")
> "http://a/b/c/g;x=1/y";))
> +        (pass-if (equal/encoded? (resolve "g;x=1/../y")
> "http://a/b/c/y";))
> +        (pass-if (equal/encoded? (resolve "g?y/./x")
> "http://a/b/c/g?y/./x";))
> +        (pass-if (equal/encoded? (resolve "g?y/../x")
> "http://a/b/c/g?y/../x";))
> +        (pass-if (equal/encoded? (resolve "g#s/./x")
> "http://a/b/c/g#s/./x";))
> +        (pass-if (equal/encoded? (resolve "g#s/../x")
> "http://a/b/c/g#s/../x";))
> +        (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
> +        
> 
> base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb


reply via email to

[Prev in Thread] Current Thread [Next in Thread]