[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] wget converts escape sequences into non-ASCII characters in f
From: |
Vincent Lefevre |
Subject: |
[Bug-wget] wget converts escape sequences into non-ASCII characters in filenames |
Date: |
Sat, 1 Nov 2008 14:51:14 +0100 |
User-agent: |
Mutt/1.5.18-vl-r25739 (2008-10-30) |
GNU Wget 1.11.4 converts escape sequences such as %E9 into non-ASCII
characters in filenames. Under GNU/Linux, this can make filenames
unreadable (because there's no standard for non-ASCII characters in
filenames). Worse, when the filesystem doesn't support such non-ASCII
data (e.g. HFS+ under Mac OS X, which expects UTF-8 only), wget fails.
Moreover the output from wget can be invalid wrt the current locales.
The bug can be reproduced with:
wget -kp 'http://www.bruit.fr/FR/print/R%E9glementation/01030100'
e.g. under Linux with UTF-8 locales:
ay:~> wget -kp 'http://www.bruit.fr/FR/print/R%E9glementation/01030100'
--2008-11-01 14:46:00-- http://www.bruit.fr/FR/print/R%E9glementation/01030100
Resolving www.bruit.fr... 217.19.48.132
Connecting to www.bruit.fr|217.19.48.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `www.bruit.fr/FR/print/R�glementation/01030100'
[...]
and under Mac OS X:
prunille:~> wget -kp 'http://www.bruit.fr/FR/print/R%E9glementation/01030100'
--2008-11-01 14:50:12-- http://www.bruit.fr/FR/print/R%E9glementation/01030100
Resolving www.bruit.fr... 217.19.48.132
Connecting to www.bruit.fr|217.19.48.132|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
www.bruit.fr/FR/print/R�glementation: Invalid
argumentwww.bruit.fr/FR/print/R�glementation/01030100: No such file or directory
Cannot write to `www.bruit.fr/FR/print/R�glementation/01030100' (No such file
or directory).
--
Vincent Lefèvre <address@hidden> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
- [Bug-wget] wget converts escape sequences into non-ASCII characters in filenames,
Vincent Lefevre <=