bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: match finds wrong space.


From: Hermann Peifer
Subject: Re: match finds wrong space.
Date: Thu, 08 Jul 2010 10:17:03 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5

On 07/07/2010 20:36, Davide Brini wrote:
On Wed, 07 Jul 2010 21:34:25 +0300 Aharon Robbins<address@hidden>  wrote:

regards - Chris Willis in the UK# insjk.awk

BEGIN {
        s = "Mary Ann jane"
        n = match( s, /\040[a-z]/ )
        print n, s
        }

Hi. Current gawk is correct, and 3.0.3 is wrong. You'll note that
following the \040 for a space you have [a-z]. This matches *lower case
letters*; the "A" following the first first is an upper case letter.

But it's matched in his example.

So, there's no bug.

He is saying that

match( s, /\040[a-z]/ )

on the line

"Mary Ann jane"

gives 5 (meaning [a-z] matches the "A"), whereas it should give 9.

I explained the reason for that in my post.


Davide,

Soemone else already explained that this is expected behaviour. Unless your are in C locale, the character range [a-z] can expanded to just about anything. Simplified examples are:

aBbCc...XxYyZz  or  aAbBcC...xXyYz

Your locale is probably similar to the latter example, this is why it matches an uppercase A. In non-C locales, use character classes like [:lower:] and [:upper:] instead of character ranges like [a-z] and [A-Z].

Hermann



reply via email to

[Prev in Thread] Current Thread [Next in Thread]