bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: major gawk bug


From: Aharon Robbins
Subject: Re: major gawk bug
Date: Tue, 8 Jun 2004 15:10:43 +0300

Greetings. Re this:

> Date: Tue, 8 Jun 2004 15:51:19 +0400
> From: Stanislav Ievlev <address@hidden>
> To: address@hidden
> Cc: address@hidden, address@hidden
> Subject: major gawk bug
>
> Hello friends!
>
> Why gawk uses setlocale(), but have a hardcoded table  (const char 
> casetable[] )
> for case-independent regexp matching?

The hard coded table predates, by many years, all the locale related code
in gawk.  No-one ever noticed until now that it was an issue.

> This table is correct for latin1 charset only, but incorrect for others,
> e.g. for KOI8-R (russian).
>
> KOI8-R encoding is fully compatible with 7-bit ASCII (so gawk compiles well),
> but has other symbols for codes greater then 128.
>
> So gawk supports only latin1, but ignore cp1251,koi8-r,koi8-u, etc.
>
> As I understand, it's not a problem to fill this table with locale
> specific symbols at start.

Code changes welcome.  I have no idea how to do that in a manner that
is correct for all 8-bit ASCII-compatible locales.  If you (or someone
else) wishes to contribute a patch, sometime soon would be a good time,
as I'm back in development mode, at least for the next little while.

> With best regards
> Stanislav Ievlev
>
> ALT Linux Team.

Thanks,

Arnold




reply via email to

[Prev in Thread] Current Thread [Next in Thread]