bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: difference in RS handling for equivalent regexps with unending input


From: Ed Morton
Subject: Re: difference in RS handling for equivalent regexps with unending input stream
Date: Wed, 3 Jul 2024 05:07:28 -0500
User-agent: Mozilla Thunderbird

It seems like there's an additional couple of issues here. If we don't have some other character (e.g. newline in my original example but could be any other character) after the final `|;`| then the output is a record behind the input even with the bracket expression regexp which works best:

|$ printf 'A;B;C;' > file $ cat file - | awk -v RS='(;|=)' '{print $0; fflush()}' $ cat file - | awk -v RS=';|=' '{print $0; fflush()}' A $ cat file - | awk -v RS='[;=]' '{print $0; fflush()}' A B |

and if we change the |RS| to add an additional possible separator to the bracket expression we get this very odd behavior:

|$ cat file - | awk -v RS='[;|=]' '{print $0; fflush()}' A $ cat file - | awk -v RS='[;a=]' '{print $0; fflush()}' A B C |


On 7/3/2024 4:10 AM, Ed Morton wrote:
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1 -fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1 -DNDEBUG uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64 2024-04-03 17:25 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin

Gawk Version: 5.3.0

Attestation 1:
        I have read https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
        Yes

Attestation 2:
        I have not modified the sources before building gawk.
        True

Description:

   Someone asked a question on SO about handling unending input from
   netcat with a regexp delimiter that's just 2 possible chars, see
https://stackoverflow.com/q/78700014/1745001, where gawk seems to be
   a record behind in it's processing. I'm using bash on cygwin, they
   used zsh on MacOS.

Repeat-By:

   I can reproduce the problem with this (hitting control-C to stop
   each command when it stops to wait for more input):

   $ printf 'A;B;C;\n' > file

   $ cat file - | awk -v RS='(;|=)' '{print NR, $0}'
   1 A

   $ cat file - | awk -v RS=';|=' '{print NR, $0}'
   1 A
   2 B

   $ cat file - | awk -v RS='[;=]' '{print NR, $0}'
   1 A
   2 B
   3 C

   Obviously that's 3 supposedly equivalent regexps producing 3
   different results.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]