bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget alpha release 1.14.96-38327


From: Andries E. Brouwer
Subject: Re: [Bug-wget] wget alpha release 1.14.96-38327
Date: Tue, 7 Jan 2014 21:14:27 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Jan 07, 2014 at 09:54:46PM +0530, Darshit Shah wrote:
> Anything still blocking the release?
> 
> 12 month release cycle sounds good to me. I'm trying to replicate the
> aforementioned issues, but no luck still.

wget still saves filenames in a buggy way:

$ echo $LC_CTYPE
en_US.UTF8

$ wget -r -np http://jinix.sourceforge.net/go/sgf/01.诘棋总动员/育苗工程手筋300题/index.html
...
Total wall clock time: 42s
Downloaded: 301 files, 106K in 0.2s (427 KB/s)

$ ls jinix.sourceforge.net/go/sgf/01.诘棋总动员
ls: cannot access jinix.sourceforge.net/go/sgf/01.诘棋总动员: No such file or 
directory
$ ls jinix.sourceforge.net/go/sgf
01.??%98??%8B?%80??%8A??%91%98

The filename here is strange and messy. It cannot be typed
on this system: it is UTF-8 but in the middle of the UTF-8 characters
some bytes have been escaped as if they were high ISO-8859-1 bytes.
The result is valid in no character set.

The only thing one can do is

% rm -r jinix.sourceforge.net
% wget --restrict-file-names=nocontrol ...

throwing away this default wget output, finding the option wget needs
to do the right thing, and starting all over again.

$ ls jinix.sourceforge.net/go/sgf/01.诘棋总动员
育苗工程手筋300题

Now it works.


Andries



reply via email to

[Prev in Thread] Current Thread [Next in Thread]