bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug in gawk 3.1.4


From: Aharon Robbins
Subject: Re: Bug in gawk 3.1.4
Date: Thu, 25 Nov 2004 13:57:29 +0200

Greetings. Re this:

> From: Bruce Lilly <address@hidden>
> To: address@hidden
> Subject: Bug in gawk 3.1.4
> Date: Wed, 24 Nov 2004 21:11:06 -0500
> X-Spam-Level: 
>
>
> Hello,
>
> I found a bug when running Jon Bentley's dformat awk script (CSTR 142).
> All tests described below on SuSE Linux 9.1 Professional, gawk built from
> source using gcc 3.4.3, bison 1.875d (both also built from source). Gawk
> passes make check "ALL TESTS PASSED".
>
> I have attached a simplified awk script and test data file.
>
> The simplified script when run on the simple data file shows a bug in
> gawk pattern matching.  The same script and data with "the one
> true awk" from Brian Kernighan's web site, also built from source,
> same compiler, etc. works fine:
>
> marty:/src/gawk/gawk-3.1.4 # ./gawk -f awktest data
> line begins with non-whitespace: left
> line begins with whitespace:  space
> line begins with whitespace:    tab
> line begins with whitespace: left
> line begins with whitespace:  space
> line begins with whitespace:    tab
> marty:/src/gawk/gawk-3.1.4 # /usr/bin/awk -f awktest data
> line begins with non-whitespace: left
> line begins with whitespace:  space
> line begins with whitespace:    tab
> line begins with non-whitespace: left
> line begins with whitespace:  space
> line begins with whitespace:    tab
>
> I haven't determined precisely where the bug is, but it's clear that
> there is a bug.  Note that gawk fails to match the second line
> which begins with a non-whitespace character to the input
> pattern /^[^ \t]/.
>
> Best regards,
>   Bruce Lilly

If you use `export LC_ALL=C' the problem will be hidden.  Otherwise,
you can apply this patch.

Thanks,

Arnold

--- ../gawk-3.1.4/dfa.c 2004-07-26 17:11:41.000000000 +0300
+++ dfa.c       2004-10-21 17:12:19.000000000 +0200
@@ -2871,6 +2871,14 @@
   if (MB_CUR_MAX > 1)
     {
       int remain_bytes, i;
+#if 0
+      /*
+       * This caching can get things wrong:
+
+      printf "ab\n\tb\n" | LC_ALL=de_DE.UTF-8 ./gawk '/^[ \t]/ { print }'
+
+       * should print \tb but doesn't
+       */
       buf_begin -= buf_offset;
       if (buf_begin <= (unsigned char const *)begin && (unsigned char const *) 
end <= buf_end) {
        buf_offset = (unsigned char const *)begin - buf_begin;
@@ -2878,6 +2886,7 @@
        buf_end = end;
        goto go_fast;
       }
+#endif
 
       buf_offset = 0;
       buf_begin = begin;
@@ -2916,7 +2925,9 @@
       mblen_buf[i] = 0;
       inputwcs[i] = 0; /* sentinel */
     }
+#if 0
 go_fast:
+#endif
 #endif /* MBS_SUPPORT */
 
   for (;;)
@@ -2930,7 +2941,7 @@
             s1 = s;
            if (d->states[s].mbps.nelem != 0)
              {
-               /* Can match with a multibyte character( and multi character
+               /* Can match with a multibyte character (and multi character
                   collating element).  */
                unsigned char const *nextp;
 
@@ -3668,9 +3679,9 @@
  done:
   if (strlen(result))
     {
-      dm = (struct dfamust *) malloc(sizeof (struct dfamust));
+      MALLOC(dm, struct dfamust, 1);
       dm->exact = exact;
-      dm->must = malloc(strlen(result) + 1);
+      MALLOC(dm->must, char, strlen(result) + 1);
       strcpy(dm->must, result);
       dm->next = dfa->musts;
       dfa->musts = dm;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]