grep-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locale aware range expressions?


From: arnold
Subject: Re: Locale aware range expressions?
Date: Sun, 28 Jan 2024 08:17:17 -0700
User-agent: Heirloom mailx 12.5 7/5/10

I think this is a bug in the documentation; the regex and dfa
libraries these days use Rational Range Interpretation(tm).

Paul, do you agree?

Arnold

"Ronan Pigott" <ronan@rjp.ie> wrote:

> Hi grep,
>
> The grep manual, in the section titled "Character Classes and Bracket
> Expressions" is careful to point out the effect of the user's locale and
> collation order on the meaning of range expressions. In particular, it
> highlights that [a-d] is equivalent to [abcd] in the C locale, but may be
> equivalent to [aAbBcCdD] in the user's locale because:
>
>   "It matches any single character that sorts between the two characters,
>   inclusive, using the locale's collating sequence and character set."
>
> However, in my experience this is not true.
>
>   $ grep ^NAME /etc/os-release; pacman -Q grep
>   NAME="Arch Linux"
>   grep 3.11-1
>   
>   $ locale | grep -E '^(LANG|LC_COLLATE|LC_ALL)'
>   LANG=en_US.UTF-8
>   LC_COLLATE="en_US.UTF-8"
>   LC_ALL=
>   
>   # locale aware collation, exactly as described in grep(1)
>   $ print -l {a..d} {A..D} | sort
>   a
>   A
>   b
>   B
>   c
>   C
>   d
>   D
>   
>   # only lowercase matches, despite A/B/C all sorting within the range
>   $ print -l {a..d} {A..D} | grep '[a-d]'
>   a
>   b
>   c
>   d
>
> This contradicts the grep manual afaict. Is this a bug in grep or the
> documentation? Is it user error?
>
> Thanks,
>
> Ronan
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]