bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RS re processing bug


From: Ron Rechenmacher
Subject: RS re processing bug
Date: Thu, 17 Jan 2008 12:26:05 -0600

Hi,

BACKGROUND:
   I wanted to filter some of the ' [0-9]* files...\r' "records" from
   "rsync --progress ..." output. I was successful in this except bothered
   that the output seemed to be one line behind (See example simple example
   below). A google search: gawk rs  problem
   showed that other would be interested in this bug fix or feature.

SIMPLEST BUG RECREATION ENV.:
   (sleep 3;echo `date +%s`;\
    sleep 3;echo `date +%s`;\
    sleep 3) | gawk \
  'BEGIN { RS="[\n]" } {printf "%d %s%c",systime(),$0,RT; fflush();}'

 compare output between RS="[\n]" and RS="\n"
 It would be nice of the times (numbers) on each line were the same in both
 cases.

 A bit more sophisticated
   (sleep 3;echo `date +%s`;\
    sleep 3;echo -n -e "`date +%s`\r";\
    sleep 3;echo -n -e "`date +%s`\r";\
    sleep 3;echo -e "\n`date +%s`";\
    sleep 3) | gawk \
  'BEGIN { RS="[\n\r]" } {printf "%d %s%c",systime(),$0,RT; fflush();}'

ENVIRONMENT:
   built gawk-3.1.6 on SLF5 (which is like RHEL5 I think).

ANALYSIS
   REs are tricky. A standard rule is to return the largest match.
   So, in general, if you have a match at the end of an input buffer,
   you do not know if (again, in general) you where to receive more input,
   if the match would perhaps grow in size. BUT, often is the case that the
   RE IS SIMPLE and you do know that the match would not grow!

SUGGESTED PATCH:  (sorry if not in exactly the right format, I'm new at this
   and would record any reply explaining the better way)
   See attached patch against gawk-3.1.6 source.
   I created the patch via:
      cd cd gawk-3.1.6
      for ff in awk.h io.c re.c;do diff -u $ff{.~1~,};done >../re_len.patch
   Note, as the definition of "SIMPLE" is vague, the patched code behaves
   differently when RS="\n|\r" and when RS="[\n\r]" but this makes at least
   a little sense :)

Thanks,
Ron



Attachment: re_len.patch
Description: re_len.patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]