bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep 2.4.2 incorrect handling of '[x^]'


From: Paul Eggert
Subject: Re: grep 2.4.2 incorrect handling of '[x^]'
Date: Mon, 30 Jul 2001 14:27:19 -0700 (PDT)

> From: "Alain Magloire" <address@hidden>
> Date: Mon, 30 Jul 2001 14:31:04 -0400 (EDT)

> > The following patch works around the bug in grep which makes
> > 
> >     grep '[a-c]'
> > 
> > match "B".
> 
> I think it has been said on this that bracket range expression
> like this depends on C locale and the ASCII character encoding.

Here's more detail about that problem.

In the current POSIX standard, [a-c] matches any collating element in
the collating seqence from 'a' through 'c' inclusive.  This is not the
same as strcoll -- so the current code is indeed nonconforming.  Nor
is it the same as the set of characters that [a-c] matches -- so the
proposed patch is also nonconforming, even for unibyte locales.  (This
is because a collating element can contain more than one character,
e.g.  '[a-z]' can match 'ss' in a German locale.)

Hence the proposed patch does not fix the whole problem.

Also, I suspect that the proposed patch mishandles some cases,
e.g. range expressions involving '[' itself.  This really should get
fixed before such a patch is installed.  That is, the patch should fix
dfa.c's behavior to match regex.c's behavior.

While we're on the subject: POSIX 1003.1-200x draft 7 says that the
behavior of [a-c] is unspecified outside the POSIX locale.  Hence grep
2.4.2 already conforms to the latest draft of the next POSIX standard!
Any changes that we make in this area are for pragmatic reasons: they
do not affect the question of whether grep conforms to the next POSIX
standard.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]