bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectl


From: Barry Allard
Subject: Re: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, whereas -m --no-iri works
Date: Mon, 28 Sep 2015 02:12:26 -0700

Stable is definitely broken (look closely at the 404’s and missing %-encoded 
files), so cutting a release will resolve this issue.

Full logs of stable (1.16.3) and latest head (e51076e6) runs:

https://gist.github.com/564ab530f5a18703ea1a

Regards,
Barry Allard


> On Sep 28, 2015, at 1:19 AM, Juaristi Álamos, Ander <address@hidden> wrote:
> 
> Hi there,
> 
> I'm afraid I cannot reproduce it in the latest git snapshot.
> 
> The resulting link is exactly the same in the website (online) and in
> the downloaded content:
> 
> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%
> 2FXLATB.html
> 
> vs
> 
> file:///home/aja/codebase/wget/www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html
> 
> When opening 'reference.html' on my browser and clicking on the link,
> it's true that the browser itself converts it from %2F to %252F, but I
> didn't get any 404 in any case. What's more, if the downloaded content
> looks exactly the same as the online one, I don't think we can consider
> this a bug. Additionally, we had a similar problem a while which was
> (apparently) resolved in commit
> b0820d553b6bef4400c493474d38930fee461b45. However, such changes have not
> been released, yet. So, which Wget version are you using? Could you
> please confirm that the issue persists in the latest git snapshot?
> 
> Thanks.
> - AJ
> 
> On Sun, 2015-09-27 at 14:29 -0700, Barry Allard wrote:
>> # skips all double-encoded [ui]ris because it reinterprets them, outside 
>> uri.c:reencode_escapes(), probably in iri.c.
>> wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
>> 
>> # works
>> wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
>> 
>> Correct [ui]ri: 
>> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html 
>> (200)
>> Incorrect [ui]ri: Correct [ui]ri: 
>> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html 
>> (404)
>> # pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/"
>> 
>> Simple-but-incomplete hackaround: use --no-ri
>> 
>> To improve compatibility with mirroring international sites, the iri code 
>> path could approximate behavior of url.c/url_parse() by avoiding unnecessary 
>> modification to --mirror extracted [ui]ris, possibly around the time it 
>> adds/dequeues them to/from the queue.
>> 
>> Best,
>> Barry Allard
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


reply via email to

[Prev in Thread] Current Thread [Next in Thread]