Re: [Bug-wget] bad filenames (again)

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)

From:	Ángel González
Subject:	Re: [Bug-wget] bad filenames (again)
Date:	Wed, 19 Aug 2015 01:43:51 +0200
User-agent:	Thunderbird

On 18/08/15 22:28, Andries E. Brouwer wrote:

On Tue, Aug 18, 2015 at 10:31:31PM +0300, Eli Zaretskii wrote:

No, it's not possible.  Windows does have a UTF-8 codepage, but it
doesn't allow setting that as the system codepage.

What is needed to have a full Unicode support in wget on Windows is to
provide replacements for all the file-name related libc functions
('fopen', 'open', 'stat', 'access', etc.) which will accept file names
encoded in UTF-8, convert them internally into UTF-16, and call the
wchar_t equivalents of those functions ('_wfopen', '_wopen', '_wstat',
'_waccess', etc.) with the converted file name.  Another thing that is
needed is similar replacements for 'printf', 'puts', 'fprintf',
etc. when they are used for writing file names to the console --
because we cannot write UTF-8 sequences to the Windows console.

Aha. That reminds me of a patch by I think Aleksey Bykov.
Yes - see http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00080.html

There we had a similar discussion, and he wrote mswindows.diff with

+int
+wc_utime (unsigned char *filename, struct _utimbuf *times)
+{
+  wchar_t *w_filename;
+  int buffer_size;
+
+  buffer_size = sizeof (wchar_t) * MultiByteToWideChar(65001, 0, filename, -1, 
w_filename, 0);
+  w_filename = alloca (buffer_size);
+  MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size);
+  return _wutime (w_filename, times);
+}

and similar for stat, open, etc. Something similar is what would be needed on 
Windows?
Is his patch usable? Maybe I also commented a little in
http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00081.html
but after that nothing happened, it seems.

Andries

That would probably work, but would need a review. On a quick look, someof the functions have memory leaks (seems he first used malloc, thenchanged to alloca just some of them).

And of course, there's the question of what to do if the filename we aretrying to convert to utf-16 is not in fact valid utf-8.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] bad filenames (again), (continued)

Prev by Date: Re: [Bug-wget] [bug #45732] Please document --ask-password in manual section 2.1
Next by Date: [Bug-wget] [bug #45776] ClamAV integration?
Previous by thread: Re: [Bug-wget] bad filenames (again)
Next by thread: Re: [Bug-wget] bad filenames (again)
Index(es):
- Date
- Thread