[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Thu, 20 Aug 2015 10:47:35 +0200 |
User-agent: |
KMail/4.14.2 (Linux/4.1.0-1-amd64; KDE/4.14.2; x86_64; ; ) |
On Wednesday 19 August 2015 17:38:39 Eli Zaretskii wrote:
> > Date: Wed, 19 Aug 2015 02:52:57 +0200
> > From: "Andries E. Brouwer" <address@hidden>
> > Cc: address@hidden
> >
> > Look at the remote filename.
> >
> > Assign a character set as follows:
> > - if the user specified a from-charset, use that
> > - if the name is printable ASCII (in 0x20-0x7f), take ASCII
> > - if the name is non-ASCII and valid UTF-8, take UTF-8
> > - otherwise take Unknown.
>
> I think this is simpler and produces the same results:
> - if the user specified a from-charset, use that
> - otherwise assume UTF-8
>
> > Determine a local character set as follows:
> > - if the user specified a to-charset, use that
> > - if the locale uses UTF-8, use that
> > - otherwise take ASCII
>
> I suggest this instead:
> - if the user specified a to-charset, use that
> - otherwise, call nl_langinfo(CODESET) to find out the current
> locale's encoding
>
> > Convert the name from from-charset to to-charset:
> > - if the user asked for unmodified filenames, do nothing
> > - if the name is ASCII, do nothing
> > - if the name is UTF-8 and the locale uses UTF-8, do nothing
> > - convert from Unknown by hex-escaping the entire name
> > - convert to ASCII by hex-escaping the entire name
> > - otherwise invoke iconv(); upon failure, escape the illegal bytes
>
> My suggestion:
> - if the user asked for unmodified filenames, do nothing
> - else invoke 'iconv' to convert from remote to local encoding
> - if 'iconv' fails, convert to ASCII by hex-escaping
>
> Hex-escaping only the bytes that fail 'iconv' is better than
> hex-escaping all of them, but it's more complex, and I'm not sure it's
> worth the hassle. But if it can be implemented without undue trouble,
> I'm all for it, as it will make wget more user-friendly in those
> cases.
>
> > Once we know what we want it is trivial to write the code,
> > but it may take a while to figure out what we want.
> > I think we should start applying the current patch.
>
> Tim says he has some/most of that coded on a branch, so I think we
> should start by merging that branch, and then take it from there.
It is in branch 'tim/wget2'. Wget2 is a rewrite from scratch, so you can just
'click on the merge button' to merge.
Basically, I keep track of the charset of each URL input (command line, input
file, stdin, downloaded+scanned). So when generating the filename we have the
to and from charset. When iconv fails here (e.g. Chinese input, ASCII output),
escaping takes place.
Tim
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/18
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again),
Tim Ruehsen <=
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Rühsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/24