Re: [Bug-wget] Redirect containing %2B behaves differently depending on

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Redirect containing %2B behaves differently depending on

From:	Ander Juaristi
Subject:	Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale
Date:	Fri, 03 Apr 2015 12:26:09 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 03/13/2015 11:48 PM, Adam Sampson wrote:

Hi,

I've just found a case where wget 1.16.3 responds to a 302 redirect
differently depending on whether it's in an ASCII or UTF-8 locale.

This works:
LC_ALL=en_GB.UTF-8 wget 
https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2

This doesn't work:
LC_ALL=C wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.5.0-src.tar.bz2

I've attached logs with -d showing what's actually going on. The
initial request gives a 302 response with a Location: that contains:
   ....tar.bz2?Signature=up6%2BtTpSF...

In the UTF-8 locale, wget correctly redirects to that location.

In the ASCII locale, wget -d print a "converted: '...' -> '...'" line
(from iri.c's do_conversion), then redirects to:
   ....tar.bz2?Signature=up6+tTpSF...

(If you try it yourself you'll get a slightly different URL, but at
least for me it usually contains %2B somewhere.)

This appears to be because do_conversion calls url_unescape on the
input string it's given -- even though that input string is a _const_
char * in the code that calls it (main -> retrieve_url -> url_parse ->
remote_to_utf8 -> do_conversion). It's not immediately obvious to me
whether that's intentional or not; at the very least, it's a surprising
bit of behaviour.

That call to url_unescape() is necessary because iconv() needs the multibyte 
characters with no encoding. My first approach, by the way, was to remove that 
call, but that caused Test-iri-percent.px to fail, which is pretty clear.

The issue seems to be at the call to reencode_escapes(), just after remote_to_utf8() returns. The problem 
here is that %2B resolves to "+" (literal). And that character is equal to the reserved character 
"+", and reencode_escapes() treats it as a reserved characters and leaves it as-is. The same 
happens with other characters, such as "=" (%3D).

What I propose is to tag the characters that have been decoded, in 
url_unescape(), and then in reencode_escapes(), verify if they coincide with 
reserved characters as well.

What do you guys think?

--
Regards,
- AJ

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Ander Juaristi <=
- Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Tim Rühsen, 2015/04/03
  - Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/03
  - Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/13
    - Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale, Tim Ruehsen, 2015/04/20
    - Re: [Bug-wget] [PATCH 2/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/21
    - Re: [Bug-wget] [PATCH 2/3] Redirect containing %2B behaves differently depending on locale, Tim Ruehsen, 2015/04/21
    - Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/21
    - Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Darshit Shah, 2015/04/21
    - Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/22
    - Re: [Bug-wget] [PATCH 3/3] Redirect containing %2B behaves differently depending on locale, Ander Juaristi, 2015/04/22

Prev by Date: Re: [Bug-wget] trying to solve a wget issue
Next by Date: Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale
Previous by thread: [Bug-wget] trying to solve a wget issue
Next by thread: Re: [Bug-wget] Redirect containing %2B behaves differently depending on locale
Index(es):
- Date
- Thread