Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale

From:	Tim Ruehsen
Subject:	Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale
Date:	Thu, 20 Nov 2014 10:43:54 +0100
User-agent:	KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; )

On Thursday 20 November 2014 00:12:08 Ángel González wrote:
> On 18/11/14 17:12, Tim Ruehsen wrote:
> > I amended three tests to fail when run with turkish locale.
> > I fixed these issues (using c_strcasecmp/c_strncasecmp) and also replaced
> > strcasecmp/strncasecmp by c_strcasecmp/c_strncasecmp at places where we
> > definitely want a ASCII comparison instead of a locale dependent one.
> > 
> > There are still some places left where we use strcasecmp/strncasecmp, e.g.
> > domain/host and filename comparisons.
> > 
> > Please have a look...
> > 
> > Tim
> 
> I had pretty much coded the same thing when I realized that your patch
> was still unapplied.
> 
> I am attaching it here fwiw. I generally changed them on a few more
> places, although I think
> some of your edits to init.c are incorrect, as well as those on
> progress.c: as they are
> user-parameters, they _might_ be introduced in the user locale (they
> would misteriously fail
> when run under C locale in cron, though. I'm not so sure it should be
> supported).

Please be more specific.
Imaging user input --level=INF (or --level=inf) will be compared with "inf". 
The turkish people will be used to enter the correct char in this case, namely 
'I' or 'i' and not 'İ' or 'ı'. Else most programs would simply break. In this 
case a ASCII comparison (c_str...) is absolutely ok. Using locale-aware 
comparison would not work (well, the user could try it out since he gets 
immediate response by Wget).

> Notwithstanding with keeping parameters in user-locale case, I made the
> accepts list C-case.
> That's the most arguable one, but doesn't seem sensible to change the
> code to support that.

I think this is not correct. The accepts and regexes are filename related. 
Filenames are not limited to ASCII. What we have to do here is a normalization 
to UTF-8 (using the users locale). Filenames/pathes found in HTML or CSS also 
have to be converted to UTF-8 (using the page's locale). These UTF-8 strings 
have to be compared with an appropriate function. str(n)casecmp would not be 
correct here, a byte-by-byte comparison like c_str(n)casecmp is better but not 
perfect. libunistring has functions for that.

I would suggest that I push my patch. 
We still have two weeks to inspect the changes... if in doubt, let's set up a 
test case. Just give an example of what could go wrong and we can simply try 
it out.

Tim

signature.asc
Description: This is a digitally signed message part.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, (continued)
- Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, Ángel González, 2014/11/19
  - Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, Tim Ruehsen <=
    - Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale, Ángel González, 2014/11/20

Prev by Date: Re: [Bug-wget] [PATCH] First
Next by Date: Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale
Previous by thread: Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale
Next by thread: Re: [Bug-wget] [PATCH] Fix possible issues running in a turkish locale
Index(es):
- Date
- Thread