bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible Bug


From: Aharon Robbins
Subject: Re: Possible Bug
Date: Thu, 25 Aug 2005 22:46:12 +0300

Greetings. Re this:

> Date: Fri, 12 Aug 2005 20:31:34 -0500
> From: address@hidden
> Subject: Possible Bug
> To: address@hidden
> Cc: address@hidden
>
> Dear Folks:
>  
> I use GAWK on a QNX4 platform and on a WinXP platform (the CYGWIN port); the
> QNX4 port is GAWK 3.0.3, and the CYGWIN port is 3.1.4.  Off and on, I have
> used this command-line script to filter out control characters so my console
> window isn't hosed:
>     gawk '{gsub(/[\x00-\x1f\x7e-\xff]/,"*");print}' x.x
>     -|gawk: fatal: Invalid range end: /[/
> The file involved does not matter.
>
> The same script on my QNX4 box successfully parses the entire file,
> regardless of the file.  If I replace the \x00 with \x01, the CYGWIN port is
> successful as well.  This script used to work when GAWK was at 3.0.3 on
> CYGWIN; has anything changed in <NUL> handling since then?
>  
> Thx, Phil Long

Interestingly enough, this problem does not show up on Fedora Core 3, but it
does show up on my Fedora Core 2 system, even with LC_ALL set to "C".

The problem seems to be that `btowc' can't handle values >= 127.  Thus,
I just sidestep that function.  I think the following patch is the right
way to go.  If I were more ambitious, I'd add an Autoconf check, but I
just don't have the cycles.

Please let me know if this solves the problem for you.

Thanks,

Arnold
-----------------------------------------------------------------------------
Thu Aug 25 22:40:40 2005  Arnold D. Robbins  <address@hidden>

        * regcomp.c (build_range_exp): Avoid `btowc' for single-byte
        characters. Fedora Core 2, maybe others, have a broken version
        that can't handle values > 127.

--- regcomp.c.save      2005-07-04 09:36:52.000000000 +0300
+++ regcomp.c   2005-08-25 22:40:21.380093044 +0300
@@ -2699,10 +2699,22 @@
     end_ch = ((end_elem->type == SB_CHAR) ? end_elem->opr.ch
              : ((end_elem->type == COLL_SYM) ? end_elem->opr.name[0]
                 : 0));
+#ifdef GAWK
+    /*
+     * Fedora Core 2, maybe others, have broken `btowc' that returns -1
+     * for any value > 127. Sigh. Note that `start_ch' and `end_ch' are
+     * unsigned, so we don't have sign extension problems.
+     */
+    start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM)
+               ? start_ch : start_elem->opr.wch);
+    end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
+             ? end_ch : end_elem->opr.wch);
+#else
     start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM)
                ? __btowc (start_ch) : start_elem->opr.wch);
     end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
              ? __btowc (end_ch) : end_elem->opr.wch);
+#endif
     if (start_wc == WEOF || end_wc == WEOF)
       return REG_ECOLLATE;
     cmp_buf[0] = start_wc;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]