[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: case-independent character range match bug, flex 2.5.25
From: |
John Millaway |
Subject: |
Re: case-independent character range match bug, flex 2.5.25 |
Date: |
Mon, 16 Dec 2002 19:07:14 -0500 (EST) |
> > When given the -i option, flex not only treats upper- and
> > lower-case letters identically as individual characters,
> > it botches ranges if a specific alphabetic character is
> > used in specifying a range:
> [...]
> > dtext ([\001-\010\013\014\016-\037!-Z^-\177])
This has been fixed. The new behavior will flag the above expression
with warnings:
$ flex -i test.l
test.l:11: warning, the character range [!-Z] is ambiguous in a
case-insensitive scanner
test.l:11: warning, the character range [^-] is ambiguous in a case-insensitive
scanner
I'll explain the warnings. The range [!-Z] is ambiguous because you
might have meant [!-z] which is a much larger range. The range
[^-\177] is ambiguous because it spans [a-z] but not [A-Z]. In both
cases, flex accepts the ranges literally, and does not squish A-Z to
a-z (which is what you want to happen here).
Below is the important part of the diff. Thanks to Bruce for
identifying this bug!
-John
Index: parse.y
===================================================================
RCS file: /usr/local/cvsroot/flex/parse.y,v
retrieving revision 2.38
retrieving revision 2.39
diff -r2.38 -r2.39
790,796c797,829
> if ( caseins )
> {
> if ( $2 >= 'A' && $2 <= 'Z' )
> $2 = clower( $2 );
> if ( $4 >= 'A' && $4 <= 'Z' )
> $4 = clower( $4 );
> }
---
>
> if (caseins)
> {
> /* Squish the character range to lowercase only if BOTH
> * ends of the range are uppercase.
> */
> if (isupper ($2) && isupper ($4))
> {
> $2 = tolower ($2);
> $4 = tolower ($4);
> }
>
> /* If one end of the range has case and the other
> * does not, or the cases are different, then we're not
> * sure what range the user is trying to express.
> * Examples: address@hidden or [S-t]
> */
> else if (has_case ($2) != has_case ($4)
> || (has_case ($2) && (b_islower ($2) !=
> b_islower ($4))))
> format_warn3 (
> _("the character range [%c-%c] is ambiguous in a
> case-insensitive scanner"),
> $2, $4);
>
> /* If the range spans uppercase characters but not
> * lowercase (or vice-versa), then should we automatically
> * include lowercase characters in the range?
> * Example: address@hidden spans [a-z] but not [A-Z]
> */
> else if (!has_case ($2) && !has_case ($4) && !range_covers_case ($2,
> $4))
> format_warn3 (
> _("the character range [%c-%c] is ambiguous in a
> case-insensitive scanner"),
> $2, $4);
> }
>