On Tue, Aug 18, 2015 at 10:31:31PM +0300, Eli Zaretskii wrote:
No, it's not possible. Windows does have a UTF-8 codepage, but it
doesn't allow setting that as the system codepage.
What is needed to have a full Unicode support in wget on Windows is to
provide replacements for all the file-name related libc functions
('fopen', 'open', 'stat', 'access', etc.) which will accept file names
encoded in UTF-8, convert them internally into UTF-16, and call the
wchar_t equivalents of those functions ('_wfopen', '_wopen', '_wstat',
'_waccess', etc.) with the converted file name. Another thing that is
needed is similar replacements for 'printf', 'puts', 'fprintf',
etc. when they are used for writing file names to the console --
because we cannot write UTF-8 sequences to the Windows console.
Aha. That reminds me of a patch by I think Aleksey Bykov.
Yes - see http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00080.html
There we had a similar discussion, and he wrote mswindows.diff with
+int
+wc_utime (unsigned char *filename, struct _utimbuf *times)
+{
+ wchar_t *w_filename;
+ int buffer_size;
+
+ buffer_size = sizeof (wchar_t) * MultiByteToWideChar(65001, 0, filename, -1,
w_filename, 0);
+ w_filename = alloca (buffer_size);
+ MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size);
+ return _wutime (w_filename, times);
+}
and similar for stat, open, etc. Something similar is what would be needed on
Windows?
Is his patch usable? Maybe I also commented a little in
http://lists.gnu.org/archive/html/bug-wget/2014-04/msg00081.html
but after that nothing happened, it seems.
Andries