Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: cygwin
Compiler: gcc
Compilation CFLAGS: -ggdb -O2 -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong
--param=ssp-buffer-size=4
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/build=/usr/src/debug/gawk-5.3.0-1
-fdebug-prefix-map=/cygdrive/d/a/scallywag/gawk/gawk-5.3.0-1.x86_64/src/gawk-5.3.0=/usr/src/debug/gawk-5.3.0-1
-DNDEBUG
uname output: CYGWIN_NT-10.0-22631 TournaMart_2023 3.5.3-1.x86_64
2024-04-03 17:25 UTC x86_64 Cygwin
Machine Type: x86_64-pc-cygwin
Gawk Version: 5.3.0
Attestation 1:
I have read
https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
Yes
Attestation 2:
I have not modified the sources before building gawk.
True
Description:
Someone asked a question on SO about handling unending input from
netcat with a regexp delimiter that's just 2 possible chars, see
https://stackoverflow.com/q/78700014/1745001, where gawk seems to be
a record behind in it's processing. I'm using bash on cygwin, they
used zsh on MacOS.
Repeat-By:
I can reproduce the problem with this (hitting control-C to stop
each command when it stops to wait for more input):
$ printf 'A;B;C;\n' > file
$ cat file - | awk -v RS='(;|=)' '{print NR, $0}'
1 A
$ cat file - | awk -v RS=';|=' '{print NR, $0}'
1 A
2 B
$ cat file - | awk -v RS='[;=]' '{print NR, $0}'
1 A
2 B
3 C
Obviously that's 3 supposedly equivalent regexps producing 3
different results.