bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep locale off by one?


From: Andreas Schwab
Subject: Re: grep locale off by one?
Date: Fri, 22 Aug 2003 11:15:30 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (gnu/linux)

address@hidden writes:

|> address@hidden wrote:
|> > What in the name of holy collating orders
|> > is going on below?
|> > $ echo "Z" | grep "[a-z]"
|> > $ echo "Y" | grep "[a-z]"
|> > Y
|> > $ echo "a" | grep "[A-Z]"

"[A-Z]" does not include "a" (which is collated before "A" in your locale).

|> > $ echo "b" | grep "[A-Z]"
|> > b
|> > I know LC_ALL=C "fixes it", and I understand the collating
|> > order being case insensitive, but why the inconsistency,
|> > on the first and last characters (a and Z
|> > in this case). Is it an off by one? Version info follows.
|> > $ rpm -q grep glibc pcre
|> > grep-2.5.1-7
|> > glibc-2.3.2-11.9
|> > pcre-3.9-10
|> > $ echo $LANG
|> > en_IE.UTF-8
|> 
|> Looks like it's UTF-8 specific.
|> Removing this makes it behave consistently.
|> 
|> $ echo "Y" | LANG=en_IE.UTF-8 grep "[a-z]"
|> Y
|> $ echo "Z" | LANG=en_IE.UTF-8 grep "[a-z]"

Similarily, "[a-z]" does not include "Z", which is collated after "z".

In general, you should avoid range expressions, as they depend on the
locale.  Better use something like "[[:upper:]]" if you want to match
_all_ upper case letters, or "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]" if you want
to match _only_ the ASCII upper case letters.

Andreas.

-- 
Andreas Schwab, SuSE Labs, address@hidden
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]