bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

m4 bug report and fix


From: Tom Erdevig
Subject: m4 bug report and fix
Date: Wed, 23 Oct 2002 13:14:58 -0400
User-agent: Mutt/1.2.5i

Hi folks,

There is a bug in the m4 regular expression matching code:
certain patterns fail to match correctly when they contain
characters in the 0xf8 - 0xff range.  Please find a patch
to fix it below.

Patterns that suffer from this bug are fairly unlikely in
the ASCII world.  I ran into it because I'm using m4 on an
EBCDIC platform (OS/390 Unix), where very common character
classes such as `[_a-zA-Z0-9]' do not match correctly
(because EBCDIC '8' = 0xf8 and '9' = 0xf9).  For example
the pattern `[a9]*a' fails to match the string `a', when
it obviously should.  The index arithmetic used to check
the character in the charset-opcode bitset breaks when the
bitset is a full 32 bytes long.  BTW the same problem was
fixed years ago at line 3817 in regex.c.


Patch:
----------------------------------------------------------
--- lib/regex.c-old     Wed Oct 23 12:24:15 2002
+++ lib/regex.c Wed Oct 23 12:26:33 2002
@@ -4272,7 +4272,7 @@
                  {
                    int not = (re_opcode_t) p1[3] == charset_not;
                     
-                   if (c < (unsigned char) (p1[4] * BYTEWIDTH)
+                   if (c < (unsigned) (p1[4] * BYTEWIDTH)
                        && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
                      not = !not;
----------------------------------------------------------

Best regards,
Tom




reply via email to

[Prev in Thread] Current Thread [Next in Thread]