bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Save 3 byte utf8 url


From: Ángel González
Subject: Re: [Bug-wget] Save 3 byte utf8 url
Date: Thu, 07 Feb 2013 21:53:50 +0100
User-agent: Thunderbird

On 07/02/13 15:06, bes wrote:
> Hi,
>
> i found some bug in wget with interpreting and save percent-encoding 3 byte
> utf8 url
>
> example:
> 1. Create url with "—". This is U+2014 (EM DASH). Percent-encoding UTF-8 is
> "%E2%80%94"
> 2. Try wget it: wget "http://example.com/abc—d"; or wget "
> http://example.com/abc%E2%80%94d"; directly
> 3. Wget save this URL to file "abc\342%80%94d". Expected is
> "abc%E2%80%94d". This is a bug.

The problem is that it checks if it's a printable character in latin1.
There is a bug at https://savannah.gnu.org/bugs/index.php?37564
An option would be to use --restrict-file-names=nocontrol to get the em
dash in the filename, instead of the percent-encoded version.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]