bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk: improvements for IGNORECASE and FS/RS


From: Stepan Kasal
Subject: Re: gawk: improvements for IGNORECASE and FS/RS
Date: Wed, 16 Oct 2002 14:52:57 +0200
User-agent: Mutt/1.2.5.1i

Good afternoon.

On Tue, Oct 15, 2002 at 06:34:18PM +0200, Aharon Robbins wrote:
> It turns out that gawk has both some memory leaks, and some gross
> inefficiencies when IGNORECASE is toggled a lot for each record.

Unfortunately, your patch introduces another inefficiency: each regex
is compiled twice, even in programs which don't ever touch IGNORECASE.

Even worse, a bug crept in:

>       * eval.c (set_IGNORECASE): Call set_RS() instead of
>       set_FS_if_not_FIELDWIDTHS().  The former calls the latter
>       for us, [...]

No it doesn't, there is a ``return'' hiding under an ``if'' near the
beginning of set_RS, waiting for the opportunity to bite you:

$ ./gawk 'BEGIN{IGNORECASE=1;FS="c";IGNORECASE=0;$0="aCa";print $1}'
a

Amazingly, when you put it the other way round:
        ... IGNORECASE=0;FS="c";IGNORECASE=1 ...
it works as it should.

The explanation is somewhat tricky:

When you do ``IGNORECASE=1;FS="c"'', then set_FS will build the regex
``[c]'' and compile it.  In this case, re_parse_field is used.
This cannot dynamically adapt to IGNORECASE.

OTOH, IGNORECASE=0;FS="c";IGNORECASE=1 is still handled by sc_parse_field.
And I beleive it would stay so even if set_RS would call set_FS.
Fortunately, the sc_parse_field checks the current value of IGNORECASE each
time it's called.

Oh, when I typed the last sentence, it helped me to discover another bug:

$ ./gawk 'BEGIN{FS="c";$0="aCa";IGNORECASE=1;print $1}'
a

This bug is present in 3.1.1 and still lives.  I think that it should be
fixed by removing the IGNORECASE feature from sc_parse_field.

Patches for all mentioned problems are attached to this mail.
They are relative to 3.1.1 + your yesterday's patch.

I've tested my patches with ``make check''.
I apologize that I don't provide new test cases for these bugs.

Enjoy,
        Stepan Kasal

[BTW: One of the factors that contributed to the second bug was the samewhat
messy implementation of "lazy field evaluation" in field.c.  I believe that
better approach would be kind of object-oriented: several struct's which
would hold complete parsing information---one struct "alive", another
waiting for next record, ...
I'd love to prove this concept by implementing it, without any performance
sacrifice, of course.  But it won't happen anytime soon, sorry.]





reply via email to

[Prev in Thread] Current Thread [Next in Thread]