Re: [Bug-wget] bad filenames (again)

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)

From:	Eli Zaretskii
Subject:	Re: [Bug-wget] bad filenames (again)
Date:	Wed, 19 Aug 2015 17:38:39 +0300

> Date: Wed, 19 Aug 2015 02:52:57 +0200
> From: "Andries E. Brouwer" <address@hidden>
> Cc: address@hidden
> 
> Look at the remote filename.
> 
> Assign a character set as follows:
> - if the user specified a from-charset, use that
> - if the name is printable ASCII (in 0x20-0x7f), take ASCII
> - if the name is non-ASCII and valid UTF-8, take UTF-8
> - otherwise take Unknown.

I think this is simpler and produces the same results:
 - if the user specified a from-charset, use that
 - otherwise assume UTF-8

> Determine a local character set as follows:
> - if the user specified a to-charset, use that
> - if the locale uses UTF-8, use that
> - otherwise take ASCII

I suggest this instead:
 - if the user specified a to-charset, use that
 - otherwise, call nl_langinfo(CODESET) to find out the current
   locale's encoding

> Convert the name from from-charset to to-charset:
> - if the user asked for unmodified filenames, do nothing
> - if the name is ASCII, do nothing
> - if the name is UTF-8 and the locale uses UTF-8, do nothing
> - convert from Unknown by hex-escaping the entire name
> - convert to ASCII by hex-escaping the entire name
> - otherwise invoke iconv(); upon failure, escape the illegal bytes

My suggestion:
 - if the user asked for unmodified filenames, do nothing
 - else invoke 'iconv' to convert from remote to local encoding
 - if 'iconv' fails, convert to ASCII by hex-escaping

Hex-escaping only the bytes that fail 'iconv' is better than
hex-escaping all of them, but it's more complex, and I'm not sure it's
worth the hassle.  But if it can be implemented without undue trouble,
I'm all for it, as it will make wget more user-friendly in those
cases.

> Once we know what we want it is trivial to write the code,
> but it may take a while to figure out what we want.
> I think we should start applying the current patch.

Tim says he has some/most of that coded on a branch, so I think we
should start by merging that branch, and then take it from there.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] bad filenames (again), (continued)

Prev by Date: Re: [Bug-wget] bad filenames (again)
Next by Date: Re: [Bug-wget] bad filenames (again)
Previous by thread: Re: [Bug-wget] bad filenames (again)
Next by thread: Re: [Bug-wget] bad filenames (again)
Index(es):
- Date
- Thread