bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem with ÅÄÖ and wget


From: Bykov Aleksey
Subject: Re: [Bug-wget] Problem with ÅÄÖ and wget
Date: Mon, 16 Sep 2013 00:36:03 +0300
User-agent: Opera Mail/12.14 (Win32)

Greetings

Thanks for correcting.
Sorry for unclean code and troubling.

- Make wget recognise utf-8 urls and accept them without nocontrol when the filesystem encoding is utf-8.
Did You sure? UTF-8 name can contain colon (i remember, that see likewise
files). And at
least in Windows colon still to be restricted char.
I think, that it is possible to use current --restrict-file-names logic,
just with add convert to widechar (and vistaversa), add checking only
symbols with code lower that 256 and in pair place replace type from
"char" to "wchar_t". Need to check. Sorry, after some time.

What happens if the filename has more than 1024 characters?
Just filename crop. Now buffer_size determines by MultiByteToWideChar. Not
sure, that it need now multipling by sizeof(wchar_t).

Big bug. The sixth argument is the space available for w_filename *in characters*, not bytes. Why bother allocating memory, if you are using a fixed size? Another opiton would be to use alloca()
I guess rename() would also need a wrapper.
Thanks.

This code should be on mswindows.c
I'm just forgot about mswindows.*. Yes, it much more situable place.

What mades w_fopen() different so it is on utils.h instead of the .c?
Sorry, i dont know. I had very little experience to understood.
Can You please take look and say what i do wrong?
I remember (belive?) that in NAME.h must be function declaration, and in
NAME.c - function body. And only if exist declaration in (included
directly or indirectly) NAME.h, other files can receive access to function
body. But with that structure my code not work.
Now all functions in utils.h (except w_fopen() ) can work in other files
without declaration, and w_fopen work only then its body in utils.h . In
attachment diffs for working and non-working variants (sorry, it based on
utils.* because in mswindows.h it is not worked at all. It must be just
appending code to tail).

--
Best regars, Alex

On Sun, 15 Sep 2013 03:54:07 +0300, Ángel González <address@hidden>
wrote:

On 15/09/13 00:59, Bykov Aleksey wrote:
Greetings

Great thanks for pushing in correct direction.

With attached patch Wget in Windows can work with UTF-8 names. But - also only with "--restrict-file-names=nocontrol"...
I think there are two issues:
- Make wget recognise utf-8 urls and accept them without nocontrol when the filesystem encoding is utf-8.
- Correctly store the filenames in Windows.

I would have started with the first one, and then treat Windows as utf-8 enabled fs, which is what this patch does. Also, isn't there any library doing already this?

diff --git a/src/utils.c b/src/utils.c
index 2ec9601..6307c88 100644
--- a/src/utils.c
+++ b/src/utils.c
@@ -2544,3 +2544,42 @@ test_dir_matches_p()

  #endif /* TESTING */

+#ifdef WINDOWS
+/* For UTF-8 in Windows support. Replacement standart fopen() utime() stat() lstat() mkdir() with wide character +analogs route. w_fopen() declared in utils.h, w_utime(), w_stat() and w_mkdir - in utils.c */

This code should be on mswindows.c
What mades w_fopen() different so it is on utils.h instead of the .c?

Commenting on just one function, as they all follow the same templte:

+int
+w_stat (const char *filename, struct_stat *buffer )
+{
+  wchar_t *w_filename;
+  int buffer_size = 1024; /* I cant push it to work with strlen() */
What happens if the filename has more than 1024 characters?
+  w_filename = malloc (buffer_size);
+  MultiByteToWideChar(65001, 0, filename, -1, w_filename, buffer_size);
Using CP_UTF8 instead of 65001 would be preferable IMHO.

Big bug. The sixth argument is the space available for w_filename *in characters*, not bytes. I would multiply buffer_size by sizeof(wchar_t) in the malloc (although you could instead divide here, too).

+  int res = _wstati64 (w_filename, buffer);
It would be better to declare res at the beginning of the function.

+  free (w_filename);
+  return res;
+}
Why bother allocating memory, if you are using a fixed size? Another opiton would be to use alloca()


I guess rename() would also need a wrapper.

Attachment: non_work.diff
Description: Binary data

Attachment: non_work_err.txt
Description: Text document

Attachment: work.diff
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]