[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: major gawk bug
From: |
Stanislav Ievlev |
Subject: |
Re: major gawk bug |
Date: |
Thu, 10 Jun 2004 12:14:58 +0400 |
On Wed, Jun 09, 2004 at 03:08:49PM +0300, Aharon Robbins wrote:
> I'm glad my patches work. I may send you some further patches
> for testing.
Yes, I can test it. Thank you.
>
> Code using tolower() is marginally slower for things like
>
> BEGIN {
> IGNORECASE = 1
> for (i = 1; i < 10000000; i++)
> val += ("ONE STRING" == "one string")
> print val
> }
>
> I have a fast machine, making it hard for me to judge whether the difference
> is worth keeping the current code. I need to think about it some more.
>
> I do believe that just using RE_ICASE will work and will probably make tht
> the main solution for re.c.
>
> I am also concerned about portability issues; while GLIBC tolower() is
> highly functional etc, GLIBC and Linux are not my entire customer base. :-)
>
> Arnold
>
> > Date: Wed, 9 Jun 2004 15:20:54 +0400
> > From: Stanislav Ievlev <address@hidden>
> > To: Aharon Robbins <address@hidden>
> > Cc: Stepan Kasal <address@hidden>, address@hidden
> > Subject: Re: major gawk bug
> >
> > Hello,
> >
> > On Tue, Jun 08, 2004 at 06:59:48PM +0300, Aharon Robbins wrote:
> > > > I beleive the right fix for regexes is to use RE_ICASE flag instead
> > > > of the translate table.
> > > > The hard-coded table is also used in gawk for various case-insensitive
> > > > comparisons; these should be replaced by a call to tolower().
> > > > The hard-coded table should be then removed.
> > >
> > > I have some tentative changes in place that work this way. It passes
> > > `make check'. I am still concerned about performance, especially
> > > the use of tolower().
> > >
> > > If you or Mr. Ievlev can test them and give me some feedback, let
> > > me know and I'll send them to you.
> > Arnold, your patch works well.
> > (little improvement:
> > - if (strcmp(cp, "C") == 0 || strcmp(cp, "POSIX") == 0)
> > + if (!cp || strcmp(cp, "C") == 0 || strcmp(cp, "POSIX") == 0)
> > )
> >
> > As I understand, we also have a solution with toupper()/tolower() functions.
> >
> > I agree with Stepan that these functions already have good optimization in
> > glibc. Solution with toupper()/tolower() is better, because currently we
> > have two translation tables (first in glibc and second in gawk) and copy
> > one to other
> > during initialization (load_ignorecase ), it looks strange.
> >
> > If interpretation of contents of these two tables is identical in gawk
> > algorithms, it's eazy to replace one another.
> >
> > --
> > With best regards
> > Stanislav Ievlev
> >
> > ALT Linux Team.
> >
> >
> > #####################################################################################
> > This Mail Was Scanned by 012.net Anti Virus Service - Powered by TrendMicro
> > Interscan
> >