[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
m4 bug report and fix
From: |
Tom Erdevig |
Subject: |
m4 bug report and fix |
Date: |
Wed, 23 Oct 2002 13:14:58 -0400 |
User-agent: |
Mutt/1.2.5i |
Hi folks,
There is a bug in the m4 regular expression matching code:
certain patterns fail to match correctly when they contain
characters in the 0xf8 - 0xff range. Please find a patch
to fix it below.
Patterns that suffer from this bug are fairly unlikely in
the ASCII world. I ran into it because I'm using m4 on an
EBCDIC platform (OS/390 Unix), where very common character
classes such as `[_a-zA-Z0-9]' do not match correctly
(because EBCDIC '8' = 0xf8 and '9' = 0xf9). For example
the pattern `[a9]*a' fails to match the string `a', when
it obviously should. The index arithmetic used to check
the character in the charset-opcode bitset breaks when the
bitset is a full 32 bytes long. BTW the same problem was
fixed years ago at line 3817 in regex.c.
Patch:
----------------------------------------------------------
--- lib/regex.c-old Wed Oct 23 12:24:15 2002
+++ lib/regex.c Wed Oct 23 12:26:33 2002
@@ -4272,7 +4272,7 @@
{
int not = (re_opcode_t) p1[3] == charset_not;
- if (c < (unsigned char) (p1[4] * BYTEWIDTH)
+ if (c < (unsigned) (p1[4] * BYTEWIDTH)
&& p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
not = !not;
----------------------------------------------------------
Best regards,
Tom
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- m4 bug report and fix,
Tom Erdevig <=