[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] GNU wget 1.15 released
From: |
Andries E. Brouwer |
Subject: |
Re: [Bug-wget] GNU wget 1.15 released |
Date: |
Sat, 25 Jan 2014 18:31:38 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Jan 22, 2014 at 06:50:55PM +0100, Giuseppe Scrivano wrote:
> Hello,
>
> I am pleased to announce the new version of GNU wget.
Good!
> ftp://ftp.gnu.org/gnu/wget/wget-1.15.tar.gz
Testing shows that wget still by default creates unusable filenames
on a UTF-8 system when downloading files with UTF-8 filenames.
(It mistakenly considers the middle of certain UTF-8 symbols as "control"
and escapes them, which is terrible. Not escaping would be correct.)
Presently, 0-31 and 127-159 are considerd "control".
Since ASCII is a subset of almost every character set in use,
this is reasonable for 0-31 and 127.
Since more and more systems use UTF-8, this is definitely
unreasonable for 128-159. These are just internal bytes
inside a UTF-8 multibyte character.
Escaping these internal bytes yields illegal filenames,
difficult or impossible to handle on the local system.
This means that one probably wants to split the concept "control"
into "control" and "highcontrol", say, in url.c
...
#define D filechr_highcontrol
...
D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, /* 128-143 */
D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, /* 144-159 */
...
#undef D
where highcontrol is considered control unless LC_CTYPE
contains UTF-8 or UTF8 or utf-8 or utf8, in which case
highcontrol characters are ordinary.
Andries
Re: [Bug-wget] GNU wget 1.15 released, Dagobert Michelsen, 2014/01/22
Re: [Bug-wget] GNU wget 1.15 released,
Andries E. Brouwer <=