[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Andries E. Brouwer |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Wed, 19 Aug 2015 23:52:12 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Wed, Aug 19, 2015 at 10:46:30PM +0300, Eli Zaretskii wrote:
> OK, then let me explain my line of reasoning. Plain ASCII is valid
> UTF-8, and if converting with iconv assuming it's UTF-8 fails, you
> know it's not valid UTF-8. So the last 3 possibilities in your
> suggestion boil down to "try converting as if it were UTF-8, and if
> that fails, you know it's Unknown".
Yes, although I would not invoke iconv to actually convert from UTF-8 to
UTF-8. Unicode is a complicated beast, and it is not certain that
conversion from UTF-8 to UTF-8 is the identity transformation.
(For example, implementations may prefer either NFC or NFD.
MacOS has its own NFD-like version for filenames.)
But you are right, one can use it as test.
After finding out that the charset is unknown I want to hex-encode
the entire filename. On the other hand, if the appropriate thing
is to invoke iconv to convert from one charset to another, I want
to hex-encode only the failing bytes.
This difference because (a) if there is reason to expect that
conversion should be possible, for example because the user
specified the from-charset as GB18030, and it fails, then often
only in a few isolated places where Microsoft extensions are used,
and it is more user-friendly to do the conversion where possible.
but (b) if nothing is known, then the character set can be a
multibyte one like SJIS where ASCII bytes occur as second halves
of symbols, and not escaping such ASCII bytes is confusing
and sometimes leads to strange problems.
Andries
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again),
Andries E. Brouwer <=
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Rühsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21