bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk bug with RS="^..."


From: Aharon Robbins
Subject: Re: gawk bug with RS="^..."
Date: Mon, 20 Dec 2004 11:48:35 +0200

In article <address@hidden> you write:
>Let me use my old example:
>
>$ gawk 'BEGIN{for(i=1;i<=100;i++) print "Axxxxxx"}' >file
>$ gawk 'BEGIN{RS="^A"} END{print NR}' file
>2
>
>This is correct: the first record in empty, the second contains most of
>the file.
>
>$ gawk 'BEGIN{RS="^Ax*\n"} END{print NR}' file
>100
>
>This is incorrect: The first RT is the first line of file, but there
>should be no other match of RS.  So the second record should be the rest
>of the file.
>
>(It is obvious what happened: RS matched the second line, as it happened
>to be at the beginning of the buffer, etc.)

That's not the case.  The file is only 800 bytes, and gawk has by that
time sucked the entire file into memory.  It's more likely an issue
that gawk isn't using whatever regex magic that says that the starting
position is in the middle of a buffer so that ^ fails to match.

I need to think about this a little bit more and maybe poke at the code a
bit more.  I still don't know that I will actually do anything about it,
other than update the doc, which I already did.

Also, ignoring "correctness in weird cases" reasons, I don't see that this
is a problem for people in practice.

Arnold




reply via email to

[Prev in Thread] Current Thread [Next in Thread]