bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk bug with RS="^..."


From: Stepan Kasal
Subject: Re: gawk bug with RS="^..."
Date: Mon, 20 Dec 2004 09:10:06 +0100
User-agent: Mutt/1.4.1i

Hello Arnold,

On Sun, Dec 19, 2004 at 07:20:25PM +0200, Aharon Robbins wrote:
> The semantics in awk of ^ and $ are always beginning and
> end of *string* not beginning and end of *line*.  As a result,
> gawk views the input file as one long string that happens
> to contain newline characters in it.

this explanation sounds very nice.  I think this is the right be
haviour.

> [...] to add some clarification to the documentation.

That would be very nice, of course.  Please forgive me that I don't
volunteer for that.

> I do not believe that the code should be changed.
> If anyone believes otherwise, I'd open to hearing why.

I do, see below.

> FWIW, mawk behaves identically to awk.

Indeed, mawk contains exactly the same bug.

Let me use my old example:

$ gawk 'BEGIN{for(i=1;i<=100;i++) print "Axxxxxx"}' >file
$ gawk 'BEGIN{RS="^A"} END{print NR}' file
2

This is correct: the first record in empty, the second contains most of
the file.

$ gawk 'BEGIN{RS="^Ax*\n"} END{print NR}' file
100

This is incorrect: The first RT is the first line of file, but there
should be no other match of RS.  So the second record should be the rest
of the file.

(It is obvious what happened: RS matched the second line, as it happened
to be at the beginning of the buffer, etc.)

$ head file | gawk 'BEGIN{RS="^Ax*\n"} END{print NR}'
10

Again, the correct output would be 2.

I realize that my first bug report wasn't as clear as it should be,
sorry.

Regards,
        Stepan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]