bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has


From: Eli Zaretskii
Subject: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?
Date: Thu, 27 Apr 2023 20:08:14 +0300

> Date: Fri, 28 Apr 2023 00:19:22 +0800
> From:  Ruijie Yu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> I'm trying out the function `libxml2-parse-html-region' as recommended
> by a thread in help-gnu-emacs.  However, I discovered that the last
> argument of this function does not help me normalize a relative url.
> 
> Reproducer:
> 
> Visit the attached toy html file.  I imagine that it is hosted at
> "https://example.com/good/day";.
> 
> Run this snippet:
> 
>     (pp (libxml-parse-html-region
>          (point-min) (point-max)
>          "https://example.com/good/day";))
> 
> Compare it with this snippet:
> 
>     (pp (libxml-parse-html-region
>          (point-min) (point-max)))
> 
> What I get is this result for both snippets (which is shown twice, once
> "pretty-printed", and once returned as a string):
> 
> --8<---------------cut here---------------start------------->8---
> (html nil
>       (body nil "\n    "
>             (a
>              ((href . "/hello"))
>              "1")
>             "\n    "
>             (a
>              ((href . "../world"))
>              "2")
>             "\n    "
>             (a
>              ((href . "good"))
>              "3")
>             "\n    "
>             (a
>              ((href . "morning/or/night"))
>              "4")
>             "\n  "))
> --8<---------------cut here---------------end--------------->8---
> 
> Notice, that the href values are not normalized: they are copied
> verbatim from the original html file.
> 
> If I understand the docstring correctly, the last argument of
> `libxml2-parse-html-region', when specified as a url string, should be
> used as the "base point" of resolving relative paths found within the
> html document.  But the <a href=xxx> paths are not resolved at the
> moment.

If you look at xml.c, you will see that we just call a libxml function
passing it this URL.  So if anything isn't as expected, the answer is
in libxml, not in Emacs.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]