bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gawk] RE bug??


From: Bob Proulx
Subject: Re: [gawk] RE bug??
Date: Fri, 8 Jul 2005 00:48:14 -0600
User-agent: Mutt/1.5.9i

Stephen Davies wrote:
> Here is the output that I see here:
> 
> bash-3.00$ echo Silver | awk '/[[:upper:]][[:upper:][:digit:]]+/'
> bash-3.00$ echo Silver | awk '/[A-Z][A-Z0-9]+/'
> bash-3.00$ echo Silver | LC_COLLATE=en_US awk '/[A-Z][A-Z0-9]+/'
> bash-3.00$ echo Silver | LC_COLLATE=POSIX awk '/[A-Z][A-Z0-9]+/'
> bash-3.00$ awk --version
> GNU Awk 3.1.4

Hmm...

> All of the above is exactly what I would expect and believe to be 
> correct. The pattern says (I believe) that I want an upper character 
> followed by one or more upper or numeric characters. 
> The second and subsequent characters of Silver are neither so it should 
> not match - as shown above.
> 
> I do not understand how your results can be.

There is just no trust in the world.  (chuckle.  :-)

Perhaps you do not have the particular locale I suggested configured
on your system?  If not then it would fallback to the C locale.  I
suggested en_US because of my location.  Since you have a .au address
I will make an assumption and suggest that you might try LANG=en_AU or
en_AU.UTF-8 and see if those locales are configured on your system.
If not then I would suggest configuring it because it can be useful.

This is system dependent but on my Debian machine this is most easily
configured with 'dpkg-reconfigure locales' which will update
/etc/locale.gen and run locale-gen for you automatically.  Your system
may use other methods.

It is somewhat humorous that one the most common bug reports against
GNU coreutils is that some distro has turned on en_US for them without
their knowledge and enabled this behavior of non-standard collating
sequence, and here is the exact reverse case where it is desired for
the test case and it is not enabled.  :-)

There is no requirement to use localization.  You would have a system
and standard behavior without.  But enabling a UTF-8 locale enables
xterm and other programs to enable UTF-8 charset encoding and other
things so that along with a unicode font it is possible to have full
international character set support.  This provides for people's names
not to be mangled on the mailing lists.  But I still use LC_COLLATE=C
to set a standard sort order!

Bob

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]