[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectl
From: |
Barry Allard |
Subject: |
Re: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, whereas -m --no-iri works |
Date: |
Mon, 28 Sep 2015 02:12:26 -0700 |
Stable is definitely broken (look closely at the 404’s and missing %-encoded
files), so cutting a release will resolve this issue.
Full logs of stable (1.16.3) and latest head (e51076e6) runs:
https://gist.github.com/564ab530f5a18703ea1a
Regards,
Barry Allard
> On Sep 28, 2015, at 1:19 AM, Juaristi Álamos, Ander <address@hidden> wrote:
>
> Hi there,
>
> I'm afraid I cannot reproduce it in the latest git snapshot.
>
> The resulting link is exactly the same in the website (online) and in
> the downloaded content:
>
> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%
> 2FXLATB.html
>
> vs
>
> file:///home/aja/codebase/wget/www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html
>
> When opening 'reference.html' on my browser and clicking on the link,
> it's true that the browser itself converts it from %2F to %252F, but I
> didn't get any 404 in any case. What's more, if the downloaded content
> looks exactly the same as the online one, I don't think we can consider
> this a bug. Additionally, we had a similar problem a while which was
> (apparently) resolved in commit
> b0820d553b6bef4400c493474d38930fee461b45. However, such changes have not
> been released, yet. So, which Wget version are you using? Could you
> please confirm that the issue persists in the latest git snapshot?
>
> Thanks.
> - AJ
>
> On Sun, 2015-09-27 at 14:29 -0700, Barry Allard wrote:
>> # skips all double-encoded [ui]ris because it reinterprets them, outside
>> uri.c:reencode_escapes(), probably in iri.c.
>> wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
>>
>> # works
>> wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
>>
>> Correct [ui]ri:
>> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html
>> (200)
>> Incorrect [ui]ri: Correct [ui]ri:
>> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html
>> (404)
>> # pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/"
>>
>> Simple-but-incomplete hackaround: use --no-ri
>>
>> To improve compatibility with mirroring international sites, the iri code
>> path could approximate behavior of url.c/url_parse() by avoiding unnecessary
>> modification to --mirror extracted [ui]ris, possibly around the time it
>> adds/dequeues them to/from the queue.
>>
>> Best,
>> Barry Allard
>
signature.asc
Description: Message signed with OpenPGP using GPGMail