bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale


From: Ángel González
Subject: Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale
Date: Thu, 20 Nov 2014 20:17:29 +0100
User-agent: Thunderbird

Darshit Shah wrote:
In short, I;m not sure if we want to use c_strcasecmp for in http.c or
recur.c since domain names allowed to be non-ASCII. And if I'm not
wrong, even the HTTP headers may contain non-ASCII data. In such cases
would c_strcasecmp work correctly?

I expect them to be in punycode at that point. Maybe I read it too fast.
And if it isn't, it would only work -by chance- if the page uses the same
encoding as your locale.

I don't think the headers would be a problem. Note that most cases only
deal with the ascii case-unsensitiveness (eg. html5 explicitely notes that).


Tim Ruehsen wrote:
On Thursday 20 November 2014 00:12:08 Ángel González wrote:
I am attaching it here fwiw. I generally changed them on a few more
places, although I think
some of your edits to init.c are incorrect, as well as those on
progress.c: as they are
user-parameters, they _might_ be introduced in the user locale (they
would misteriously fail
when run under C locale in cron, though. I'm not so sure it should be
supported).
Please be more specific.
I would have included the diff of the diffs, but that would have been really ugly to read. :)

Seemed better to provide the other patch.


Imaging user input --level=INF (or --level=inf) will be compared with "inf".
The turkish people will be used to enter the correct char in this case, namely
'I' or 'i' and not 'İ' or 'ı'. Else most programs would simply break. In this
case a ASCII comparison (c_str...) is absolutely ok. Using locale-aware
comparison would not work (well, the user could try it out since he gets
immediate response by Wget).
Then we can get rid of the strcasecmp comparisons left.


Notwithstanding with keeping parameters in user-locale case, I made the
accepts list C-case.
That's the most arguable one, but doesn't seem sensible to change the
code to support that.
I think this is not correct.
This comment was related to in_acclist. That function can perform comparisons
in several ways:
- Using fnmatch_nocase (this function explicitely uses C locale)
- Using match_tail() (I had changed it to c_strcasecmp, as it was used for eg. cookies)
- Using a strcasecmp

Thus, I changed the third branch to c_strcasecmp

The accepts and regexes are filename related.
Filenames are not limited to ASCII. What we have to do here is a normalization
to UTF-8 (using the users locale). Filenames/pathes found in HTML or CSS also
have to be converted to UTF-8 (using the page's locale). These UTF-8 strings
have to be compared with an appropriate function. str(n)casecmp would not be
correct here, a byte-by-byte comparison like c_str(n)casecmp is better but not
perfect. libunistring has functions for that.
You are probably right, which means we should redo this part of the code.


I would suggest that I push my patch.
We still have two weeks to inspect the changes... if in doubt, let's set up a
test case. Just give an example of what could go wrong and we can simply try
it out.
Ok, I rebased my patch on top of what you committed.

Changes:
-Remove strcase(n)cmp from cmpt.c -> Shouldn't be a problem
- Domain comparisons on cookies.c -> Perhaps not ok
- css-url.c and ftp-basic.c ->  hardcoded ascii text , shall be c_strcasecmp
- ftp.c -> it's a user-provided glob, we can keep strcasecmp
- hash -> Mixed. make_nocase_string_hash_table is used for hosts, tags and attributes
- html-parse.c -> the tags are only C-insensitive. Shall be changed
- html-url.c -> same with attrs
- parse_content_range  -> this is a header reply. Change
- persistent_available_p (1) -> maybe not ok
- persistent_available_p (2-3) -> these are C constants. Change
- ensure_extension -> only called with either .html or .css. I would change it.
- netrc -> Domain. Maybe not ok
- recur -> idem
- openssl.c -> clarifies a comment
- url.c -> We are comapring with a scheme, thus C comparison.
- utils.c -> Discussed above

So while some are completely clear, we should clarify hosts comparison.

Regards

Attachment: 0001-Replace-strcasecmp-and-strncasecmp-with-c_strcasecmp.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]