[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Fwd: gawk bug]
From: |
Aharon Robbins |
Subject: |
Re: [Fwd: gawk bug] |
Date: |
Tue, 23 Nov 2004 14:16:57 +0200 |
Greetings. Re this:
> Date: Tue, 23 Nov 2004 10:13:07 +0100
> From: Eiso AB <address@hidden>
> To: address@hidden
> Subject: [Fwd: gawk bug]
>
> hi again(?),
>
> not sure if this got through yesterday, and I didn't include output.
>
> the script produces
> H99 N|
> instead of the expected
> H99 N
>
> so in the second gsub not all |'s
> are removed.
>
> goodluck, Eiso
>
>
> -------- Original Message --------
> Subject: gawk bug
> Date: Mon, 22 Nov 2004 18:20:42 +0100
> From: Eiso AB <address@hidden>
> To: address@hidden
> CC: address@hidden
> References: <address@hidden> <address@hidden>
> <address@hidden>
>
> hi Arnold,
>
> this produces a bug in gawk 3.1.4
>
>
> put this in an awk script (bug.awk)
> {
> ( FILENAME ~ /[.]save$/ )
> h=$2
> gsub("[|]"," ",h)
> x=$1
> gsub("[|]"," ",x)
> print x
> }
>
> then run as
>
> echo "|H99|N| |H99|HN|" > t.save ; gawk -f bug.awk t.save
>
>
>
> removal of:
> * ( FILENAME ~ /[.]save$/ )
> * gsub("[|]"," ",h)
> and change of the filename(!)
> will make the error disappear.
>
>
> very strange!
> goodluck,
It turns out that my current development code doesn't show the
problem. Here is the patch.
I verified that dropping this into 3.1.4 fixes the problem.
Whew, I'm glad it was this easy!
Arnold
-----------------------
--- ../gawk-3.1.4/dfa.c 2004-07-26 17:11:41.000000000 +0300
+++ dfa.c 2004-10-21 17:12:19.000000000 +0200
@@ -2871,6 +2871,14 @@
if (MB_CUR_MAX > 1)
{
int remain_bytes, i;
+#if 0
+ /*
+ * This caching can get things wrong:
+
+ printf "ab\n\tb\n" | LC_ALL=de_DE.UTF-8 ./gawk '/^[ \t]/ { print }'
+
+ * should print \tb but doesn't
+ */
buf_begin -= buf_offset;
if (buf_begin <= (unsigned char const *)begin && (unsigned char const *)
end <= buf_end) {
buf_offset = (unsigned char const *)begin - buf_begin;
@@ -2878,6 +2886,7 @@
buf_end = end;
goto go_fast;
}
+#endif
buf_offset = 0;
buf_begin = begin;
@@ -2916,7 +2925,9 @@
mblen_buf[i] = 0;
inputwcs[i] = 0; /* sentinel */
}
+#if 0
go_fast:
+#endif
#endif /* MBS_SUPPORT */
for (;;)
@@ -2930,7 +2941,7 @@
s1 = s;
if (d->states[s].mbps.nelem != 0)
{
- /* Can match with a multibyte character( and multi character
+ /* Can match with a multibyte character (and multi character
collating element). */
unsigned char const *nextp;
@@ -3668,9 +3679,9 @@
done:
if (strlen(result))
{
- dm = (struct dfamust *) malloc(sizeof (struct dfamust));
+ MALLOC(dm, struct dfamust, 1);
dm->exact = exact;
- dm->must = malloc(strlen(result) + 1);
+ MALLOC(dm->must, char, strlen(result) + 1);
strcpy(dm->must, result);
dm->next = dfa->musts;
dfa->musts = dm;