help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-independent character range match bug, flex 2.5.25


From: John Millaway
Subject: Re: case-independent character range match bug, flex 2.5.25
Date: Mon, 16 Dec 2002 19:07:14 -0500 (EST)

> > When given the -i option, flex not only treats upper- and
> > lower-case letters identically as individual characters,
> > it botches ranges if a specific alphabetic character is
> > used in specifying a range:
> [...]
> > dtext   ([\001-\010\013\014\016-\037!-Z^-\177])


This has been fixed. The new behavior will flag the above expression
with warnings:

$ flex -i  test.l
test.l:11: warning, the character range [!-Z] is ambiguous in a 
case-insensitive scanner
test.l:11: warning, the character range [^-] is ambiguous in a case-insensitive 
scanner

I'll explain the warnings. The range [!-Z] is ambiguous because you
might have meant [!-z] which is a much larger range. The range
[^-\177] is ambiguous because it spans [a-z] but not [A-Z]. In both
cases, flex accepts the ranges literally, and does not squish A-Z to
a-z (which is what you want to happen here).

Below is the important part of the diff. Thanks to Bruce for
identifying this bug!

-John



Index: parse.y
===================================================================
RCS file: /usr/local/cvsroot/flex/parse.y,v
retrieving revision 2.38
retrieving revision 2.39
diff -r2.38 -r2.39
790,796c797,829
>   if ( caseins )
>                  {
>                  if ( $2 >= 'A' && $2 <= 'Z' )
>                                  $2 = clower( $2 );
>                  if ( $4 >= 'A' && $4 <= 'Z' )
>                                  $4 = clower( $4 );
>                  }
---
>
>   if (caseins)
>        {
>          /* Squish the character range to lowercase only if BOTH
>               * ends of the range are uppercase.
>               */
>          if (isupper ($2) && isupper ($4))
>                {
>                  $2 = tolower ($2);
>                  $4 = tolower ($4);
>                }
>
>          /* If one end of the range has case and the other
>               * does not, or the cases are different, then we're not
>               * sure what range the user is trying to express.
>               * Examples: address@hidden or [S-t]
>               */
>          else if (has_case ($2) != has_case ($4)
>                               || (has_case ($2) && (b_islower ($2) != 
> b_islower ($4))))
>                format_warn3 (
>                _("the character range [%c-%c] is ambiguous in a 
> case-insensitive scanner"),
>                                          $2, $4);
>
>          /* If the range spans uppercase characters but not
>               * lowercase (or vice-versa), then should we automatically
>               * include lowercase characters in the range?
>               * Example: address@hidden spans [a-z] but not [A-Z]
>               */
>          else if (!has_case ($2) && !has_case ($4) && !range_covers_case ($2, 
> $4))
>                format_warn3 (
>                _("the character range [%c-%c] is ambiguous in a 
> case-insensitive scanner"),
>                                          $2, $4);
>        }
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]