bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk bug with RS="^..."


From: Aharon Robbins
Subject: Re: gawk bug with RS="^..."
Date: Sun, 19 Dec 2004 19:20:25 +0200

Hi.  I've just spent a little time looking at this.

The semantics in awk of ^ and $ are always beginning and
end of *string* not beginning and end of *line*.  As a result,
gawk views the input file as one long string that happens
to contain newline characters in it.

FWIW, mawk behaves identically to awk.

I believe that the correct course of action is simply to
add some clarification to the documentation. I do not
believe that the code should be changed.

If anyone believes otherwise, I'd open to hearing why.

Thanks,

Arnold

> Date: Tue, 14 Dec 2004 14:48:58 +0100
> From: Stepan Kasal <address@hidden>
> Subject: gawk bug with RS="^..."
> To: address@hidden
>
> Hello,
>   I've noticed a problem with "^" in RS in gawk.  In most cases, it seems
> to match only the beginning of the file.  But in fact it matches the
> beginning of gawk's internal buffer.
>
> Observe the following example:
>
> $ gawk 'BEGIN{for(i=1;i<=100;i++) print "Axxxxxx"}' >file
> $ gawk 'BEGIN{RS="^A"} END{print NR}' file
> 2
> $ gawk 'BEGIN{RS="^Ax*\n"} END{print NR}' file
> 100
> $ head file | gawk 'BEGIN{RS="^Ax*\n"} END{print NR}'
> 10
> $
>
> I think this calls for some clarification/fix.  But I don't have any
> fixed opinion how the solution should look like.
>
> Have a nice day,
>         Stepan Kasal
>
> PS: See also the discussion of the issue in the comp.lang.awk newsgroup.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]