bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] wget converts escape sequences into non-ASCII characters in f


From: Vincent Lefevre
Subject: [Bug-wget] wget converts escape sequences into non-ASCII characters in filenames
Date: Sat, 1 Nov 2008 14:51:14 +0100
User-agent: Mutt/1.5.18-vl-r25739 (2008-10-30)

GNU Wget 1.11.4 converts escape sequences such as %E9 into non-ASCII
characters in filenames. Under GNU/Linux, this can make filenames
unreadable (because there's no standard for non-ASCII characters in
filenames). Worse, when the filesystem doesn't support such non-ASCII
data (e.g. HFS+ under Mac OS X, which expects UTF-8 only), wget fails.

Moreover the output from wget can be invalid wrt the current locales.

The bug can be reproduced with:

  wget -kp 'http://www.bruit.fr/FR/print/R%E9glementation/01030100'

e.g. under Linux with UTF-8 locales:

ay:~> wget -kp 'http://www.bruit.fr/FR/print/R%E9glementation/01030100' 
--2008-11-01 14:46:00--  http://www.bruit.fr/FR/print/R%E9glementation/01030100
Resolving www.bruit.fr... 217.19.48.132
Connecting to www.bruit.fr|217.19.48.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `www.bruit.fr/FR/print/R�glementation/01030100'
[...]

and under Mac OS X:

prunille:~> wget -kp 'http://www.bruit.fr/FR/print/R%E9glementation/01030100'
--2008-11-01 14:50:12--  http://www.bruit.fr/FR/print/R%E9glementation/01030100
Resolving www.bruit.fr... 217.19.48.132
Connecting to www.bruit.fr|217.19.48.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
www.bruit.fr/FR/print/R�glementation: Invalid 
argumentwww.bruit.fr/FR/print/R�glementation/01030100: No such file or directory

Cannot write to `www.bruit.fr/FR/print/R�glementation/01030100' (No such file 
or directory).

-- 
Vincent Lefèvre <address@hidden> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]