[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Patch: Make url_file_name also convert remote path to loc
From: |
Yuxi Hao |
Subject: |
Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded |
Date: |
Tue, 14 Nov 2017 19:59:55 +0800 |
Dear Eli and Tim,
First, I would say, my last 2 patches are for different problems.
Next, let's make it clear:
'Make url_file_name also convert remote path to local encoded', is to convert
all characters from URL (server, most UTF8) to locale encoded (GBK for
example), and then append them to the '-P' specified local path. Or if we use
iconv on a mix-encoded string, error occurs. Right? :)
It is for iconv.
'Fix printing mutibyte characters as unprintable characters on Windows', this
one need 'setlocale' to be called in case of 'ENABLE_NLS' is not defined for
windows, to make it display the non-ASC chas correctly in console. :) As Eli
said. Please refer to https://msdn.microsoft.com/en-us/library/x99tb11d.aspx.
It is for displaying in console.
Best Regards,
YX Hao
> -----Original Message-----
> From: Eli Zaretskii [mailto:address@hidden
> Sent: 2017年11月14日 0:33
> To: Tim Rühsen <address@hidden>
> Cc: address@hidden; address@hidden
> Subject: Re: [Bug-wget] Patch: Make url_file_name also convert remote path to
> local encoded
>
> > Cc: address@hidden, address@hidden
> > From: Tim Rühsen <address@hidden>
> > Date: Mon, 13 Nov 2017 16:36:39 +0100
> >
> > > I don't think it's a Gnulib issue. The problem is that on Windows,
> > > the implicit call at the beginning of Wget
> > >
> > > setlocale (LC_ALL, "C");
> >
> > Why is there an explicit call with "C" ? There is an explicit call with "".
>
> I said "implicit", not "explicit". Such an implicit call is made at the
> beginning
> of every C program, per ANSI C Standard. Right?
>
> The MSDN documentation says it clearly:
>
> At program startup, the equivalent of the following statement is executed:
>
> setlocale( LC_ALL, "C" );
>
> > From the man page:
> > "If locale is an empty string, "", each part of the locale that should
> > be modified is set according to the environment variables."
>
> The call with a locale of "" is only done in a build that has ENABLE_NLS
> defined.
> I was talking about a build which didn't define ENABLE_NLS.
>
> > > is not good enough to work in multibyte locales of the Far East,
> > > because the Windows runtime assumes a single-byte locale after that
> > > call. And since Wget happens to need to display text and create
> > > files with non-ASCII characters, it gets hit more than other programs.
> >
> > I (hopefully) can understand why this doesn't work. NTFS uses UTF-16
> > for the filenames. If your environment specifies a single-character
> > encoding (e.g. C) and we use at some point a multi-character encoding (e.g.
> > utf-8), then any automatic conversion to UTF-16 filenames are likely
> > to fail. For me the question is: a) does wget has a bug (e.g. creating
> > a filename with a wrong encoded name string or b) does the Windows API
> > has a problem.
> >
> > > The proposed solution is to add a special call to setlocale which
> > > gets this right on Windows.
> >
> > Why can't we just convert the filename string into the correct
> > encoding and then create the file ? What do I miss ?
>
> I guess you are missing a short introduction to the Windows l10n/i18n mess.
> Let me try.
>
> First, the fact that NTFS uses UTF-16 is not really relevant. Wget uses
> 'char *'
> strings, not 'wchar *' strings to store file names and call C library
> functions that
> accept file names. So we cannot use the
> UTF-16 encoding of non-ASCII file names directly. Instead, we use the
> locale's
> codepage (the C library and the OS APIs then convert to
> UTF-16 before hitting the disk, but that's not important now).
>
> Next, creating and opening file names is not the only problem: we need also to
> display these file names and URLs, and that also needs to use the encoding
> expected by the Windows console.
>
> Now, in any locale which uses single-byte encoding of non-ASCII characters,
> the
> C locale will support those characters, both for I/O and for functions like
> strcmp,
> strlen, strcoll, etc. But not in double-byte locales of the Far East: there,
> you
> must explicitly call setlocale with the correct codepage, to have the local
> character set supported. This support includes manipulating file names,
> calling C library functions to access files, and displaying non-ASCII text,
> such as
> file names and URLs, on the console.
>
> IOW, this is a Windows runtime subtlety that unfortunately needs to be fixed
> in
> the application code.
>
> (UTF-8 is not relevant at all here, because Windows doesn't support
> UTF-8 as the locale's codeset; if you try to call setlocale to set
> UTF-8 as the codeset, setlocale will simply fail. So if we have a
> UTF-8 encoded URL or file name inside wget, we must convert it to the current
> codepage by calling libiconv functions.)
>
> Does the above make sense? Let me know if I have to explain some more.
- [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, YX Hao, 2017/11/02
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Tim Rühsen, 2017/11/12
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Eli Zaretskii, 2017/11/12
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/13
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Tim Rühsen, 2017/11/13
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Eli Zaretskii, 2017/11/13
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded,
Yuxi Hao <=
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Tim Rühsen, 2017/11/15
- Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Eli Zaretskii, 2017/11/15
Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/13
Re: [Bug-wget] Patch: Make url_file_name also convert remote path to local encoded, Yuxi Hao, 2017/11/13