[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: character ranges in regular expressions
From: |
Aharon Robbins |
Subject: |
Re: character ranges in regular expressions |
Date: |
Mon, 04 Oct 2010 22:43:57 +0200 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Sorry for chiming in on this rather late...
> Date: Fri, 24 Sep 2010 16:27:53 -0600
> From: Eric Blake <address@hidden>
> To: Bruno Haible <address@hidden>
> Cc: Paolo Bonzini <address@hidden>, Paul Eggert <address@hidden>,
> address@hidden, Jim Meyering <address@hidden>
> Subject: Re: character ranges in regular expressions
>
> On 09/24/2010 03:52 PM, Bruno Haible wrote:
> >
> > 1) Is there an agreement of what the result should be? Jim seems to prefer
> > to
> > extrapolate the result of the "C" locale, i.e. 26.
>
> As do I.
>
> > For other people, the locale
> > dependent behaviour is useful, that is, 51 is desired.
>
> Which is why my proposal is that glibc consider:
>
> [A-Z] => match C locale; 26 letters, regardless of locale
> [[.A.]-[.Z.]] => use collation rules, since we explicitly spelled things
> with collation symbols (26 letters in POSIX local, 51 or even more in
> other locales, since accented characters might be included in the
> collation range), so that we aren't completely losing CEO behavior (if
> someone seriously has a reason to use it)
> [[:upper:]] => per POSIX rules in all locales
This would be great. In what must be close to (or more than) the
10 years since gawk started supporting locales, I have yet to meet
anyone who thinks that [a-z] matching [A-Y] is a feature!
Thanks,
Arnold
- Re: character ranges in regular expressions,
Aharon Robbins <=