[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk bug with RS="^..."
From: |
Aharon Robbins |
Subject: |
Re: gawk bug with RS="^..." |
Date: |
Sun, 19 Dec 2004 19:20:25 +0200 |
Hi. I've just spent a little time looking at this.
The semantics in awk of ^ and $ are always beginning and
end of *string* not beginning and end of *line*. As a result,
gawk views the input file as one long string that happens
to contain newline characters in it.
FWIW, mawk behaves identically to awk.
I believe that the correct course of action is simply to
add some clarification to the documentation. I do not
believe that the code should be changed.
If anyone believes otherwise, I'd open to hearing why.
Thanks,
Arnold
> Date: Tue, 14 Dec 2004 14:48:58 +0100
> From: Stepan Kasal <address@hidden>
> Subject: gawk bug with RS="^..."
> To: address@hidden
>
> Hello,
> I've noticed a problem with "^" in RS in gawk. In most cases, it seems
> to match only the beginning of the file. But in fact it matches the
> beginning of gawk's internal buffer.
>
> Observe the following example:
>
> $ gawk 'BEGIN{for(i=1;i<=100;i++) print "Axxxxxx"}' >file
> $ gawk 'BEGIN{RS="^A"} END{print NR}' file
> 2
> $ gawk 'BEGIN{RS="^Ax*\n"} END{print NR}' file
> 100
> $ head file | gawk 'BEGIN{RS="^Ax*\n"} END{print NR}'
> 10
> $
>
> I think this calls for some clarification/fix. But I don't have any
> fixed opinion how the solution should look like.
>
> Have a nice day,
> Stepan Kasal
>
> PS: See also the discussion of the issue in the comp.lang.awk newsgroup.