bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PATCH: tolower-toupper (gawk)


From: Stepan Kasal
Subject: Re: PATCH: tolower-toupper (gawk)
Date: Sat, 23 Oct 2004 16:36:53 +0200
User-agent: Mutt/1.4.1i

Hello,

On Fri, Oct 22, 2004 at 01:26:14PM +0300, Baris Metin wrote:
> tr_TR and tr_TR.UTF-8 locales [...]
> In Turkish upper-case version of i is "I with dot above" (0130;LATIN
> CAPITAL LETTER I WITH DOT ABOVE). A single byte character is converted to
> a multi-byte character.
> Same problem araise lowercasing I. Lower-case version of I in Turkish is
> "i without the dot above" (0131;LATIN SMALL LETTER DOTLESS I).

thank you very much for reporting the problem.

> I've attached a patch which solves the problem.

Unfortunately, I see several bugs in your patch:

1) You left the following
                        *cp = TOLOWER(*cp);
at the end of do_tolower().  It should be
                        *cp3 = TOLOWER(*cp);

2) When your implementation of tolower encounters a non-uppercase wide char,
it copies only the first byte, so the next call of mbrtowc will see the rest
of this char.
3) The test  isalpha(*cp3)  is not what you want.  You have to examine the
return value of the preceding wcrtomb().

4) You cannot call tmp_string() because you don't know the length of the
output string in advance.

[Bugs 2) and 3) cannot bite with UTF-8, but, hey, there are other encodings
too.]
The last one is the hardest to fix.

I made a patch (against gawk-3.1.4) which should solve your problem too.
I cannot test it, as my glibc seems to beleive that tolower("I") is "I" under
tr_TR.UTF-8.

Could you please test the patch and mail the result?

If it'll work, Arnold (the gawk maintainer, also on this list), can consider
whether he accepts my patch.

With kind regards,
        Stepan Kasal

Attachment: gawk-3.1.4-wide_tolower.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]