bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I think I may have found a regex bug in a recent version of grep?


From: Stepan Kasal
Subject: Re: I think I may have found a regex bug in a recent version of grep?
Date: Tue, 8 Jun 2004 10:53:06 +0200
User-agent: Mutt/1.4.1i

Hello,

On Mon, Jun 07, 2004 at 08:37:47PM -0700, Steve Ingram wrote:
> Let me know if you need any other info or let me know
> if you'd rather not hear from me again :)

I think you understand perfectly the difference.
According to your knowledge you were reporting a possible bug,
and that's OK. So: thank you for your bug report.

> address@hidden > grep -v "\.[a-z]*" data.txt

First, I suggest using single quotes in this situation.
Even though "\." is the same as '\.', backslash is a special char
inside double quotes, thus to match one backslash you'd have to write
"\\\\" instead of simpler '\\'

"\.[a-z]*"

Of course this is is equivalent to "\." as "[a-z]*" means zero or more
occurences and can thus always match the empty string.

Thus
        grep '\.'
matches all lines of data.txt.

> address@hidden > grep -v "\.[a-z]" data.txt

This is a trick with locales.  Your locale is probably set to
"en_US.utf8" and it changes the order of characters from
        ABC...Z ... abc...z
to
        aAbB...yYzZ

This means that the interval a-z contains all capital letters with except
capital Z.

Observe:
$ echo A | LC_ALL=en_US.utf8 grep '[a-z]'
A
$ echo A | LC_ALL=C grep '[a-z]'
$

Thus the fix is to use "export LC_ALL=C" at the beginning of all your bash
scripts (similarily for other shells).

I beleive this explains all your problems.

Sorry for the inconvenience,
        Stepan Kasal




reply via email to

[Prev in Thread] Current Thread [Next in Thread]