[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Possible Bug
From: |
Aharon Robbins |
Subject: |
Re: Possible Bug |
Date: |
Thu, 25 Aug 2005 22:46:12 +0300 |
Greetings. Re this:
> Date: Fri, 12 Aug 2005 20:31:34 -0500
> From: address@hidden
> Subject: Possible Bug
> To: address@hidden
> Cc: address@hidden
>
> Dear Folks:
>
> I use GAWK on a QNX4 platform and on a WinXP platform (the CYGWIN port); the
> QNX4 port is GAWK 3.0.3, and the CYGWIN port is 3.1.4. Off and on, I have
> used this command-line script to filter out control characters so my console
> window isn't hosed:
> gawk '{gsub(/[\x00-\x1f\x7e-\xff]/,"*");print}' x.x
> -|gawk: fatal: Invalid range end: /[/
> The file involved does not matter.
>
> The same script on my QNX4 box successfully parses the entire file,
> regardless of the file. If I replace the \x00 with \x01, the CYGWIN port is
> successful as well. This script used to work when GAWK was at 3.0.3 on
> CYGWIN; has anything changed in <NUL> handling since then?
>
> Thx, Phil Long
Interestingly enough, this problem does not show up on Fedora Core 3, but it
does show up on my Fedora Core 2 system, even with LC_ALL set to "C".
The problem seems to be that `btowc' can't handle values >= 127. Thus,
I just sidestep that function. I think the following patch is the right
way to go. If I were more ambitious, I'd add an Autoconf check, but I
just don't have the cycles.
Please let me know if this solves the problem for you.
Thanks,
Arnold
-----------------------------------------------------------------------------
Thu Aug 25 22:40:40 2005 Arnold D. Robbins <address@hidden>
* regcomp.c (build_range_exp): Avoid `btowc' for single-byte
characters. Fedora Core 2, maybe others, have a broken version
that can't handle values > 127.
--- regcomp.c.save 2005-07-04 09:36:52.000000000 +0300
+++ regcomp.c 2005-08-25 22:40:21.380093044 +0300
@@ -2699,10 +2699,22 @@
end_ch = ((end_elem->type == SB_CHAR) ? end_elem->opr.ch
: ((end_elem->type == COLL_SYM) ? end_elem->opr.name[0]
: 0));
+#ifdef GAWK
+ /*
+ * Fedora Core 2, maybe others, have broken `btowc' that returns -1
+ * for any value > 127. Sigh. Note that `start_ch' and `end_ch' are
+ * unsigned, so we don't have sign extension problems.
+ */
+ start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM)
+ ? start_ch : start_elem->opr.wch);
+ end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
+ ? end_ch : end_elem->opr.wch);
+#else
start_wc = ((start_elem->type == SB_CHAR || start_elem->type == COLL_SYM)
? __btowc (start_ch) : start_elem->opr.wch);
end_wc = ((end_elem->type == SB_CHAR || end_elem->type == COLL_SYM)
? __btowc (end_ch) : end_elem->opr.wch);
+#endif
if (start_wc == WEOF || end_wc == WEOF)
return REG_ECOLLATE;
cmp_buf[0] = start_wc;