Re: wget2 | html hex entities are not correctly decoded (#637)

wget-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget2 | html hex entities are not correctly decoded (#637)

From:	@rockdaboot
Subject:	Re: wget2 \| html hex entities are not correctly decoded (#637)
Date:	Sun, 27 Aug 2023 18:46:29 +0000



Tim Rühsen commented: 
https://gitlab.com/gnuwget/wget2/-/issues/637#note_1531241091

I had to read it up, was too long ago :smile:

So yes, URLs from HTML/XML documents are supposed to contain HTML/XML entities 
including the `&#dddd;` and the `&#xhhhh;` forms.

The latter was not implemented in `wget_xml_decode_entities_inline()`. Not it 
is (pushed to master) :).

The IRI unescape does URI/IRI unescaping, which is something different. So 
there are two layers of unescaping when reading+parsing a URL from an HTML or 
XML document.

-- 
Reply to this email directly or view it on GitLab: 
https://gitlab.com/gnuwget/wget2/-/issues/637#note_1531241091
You're receiving this email because of your account on gitlab.com.

[Prev in Thread]

Current Thread

[Next in Thread]

wget2 | html hex entities are not correctly decoded (#637), Michael Roosz (@michaelroosz), 2023/08/23
- Re: wget2 | html hex entities are not correctly decoded (#637), @rockdaboot, 2023/08/26
- Re: wget2 | html hex entities are not correctly decoded (#637), @rockdaboot <=
- Re: wget2 | html hex entities are not correctly decoded (#637), @rockdaboot, 2023/08/27
- Re: wget2 | html hex entities are not correctly decoded (#637), @rockdaboot, 2023/08/27

Prev by Date: Re: wget2 | wget2 can't find libssl and libcrypto equivalent in MacOS Ventura (#638)
Next by Date: Re: wget2 | html hex entities are not correctly decoded (#637)
Previous by thread: Re: wget2 | html hex entities are not correctly decoded (#637)
Next by thread: Re: wget2 | html hex entities are not correctly decoded (#637)
Index(es):
- Date
- Thread