bug#60690: -P '\d' in GNU and git grep

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60690: -P '\d' in GNU and git grep

From:	Paul Eggert
Subject:	bug#60690: -P '\d' in GNU and git grep
Date:	Fri, 7 Apr 2023 09:48:40 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0

On 2023-04-06 08:45, demerphq wrote:

Although this causes pcre2grep to mishandle Unicode characters:

    $ echo 'Ævar' | pcre2grep '[Ssß]'
    Ævar

it mimics Perl 5.36:

    $ echo 'Ævar' | perl -ne 'print $_ if /[Ssß]/'
    Ævar

so this seems to be what Perl users expect, despite its infelicities.

Actually no, I think you have misunderstood what is happening at the
different layers involved here.

No, I understood what was going on. My point was that Perl users seem tohave accepted this behavior, even though it does not match what peoplewould ordinarily expect.

What you should have done is something like this:

No, for two reasons. First, I'm no Perl expert and so I don't know (anddon't particularly want to learn) its complicated Unicode options andcalls. Second, /[Ss\x{DF}]/u is hard to read. If I want the S letters oftraditional German, I'll write them in the obvious way, as [Ssß]. Nodoubt Perl will let me do this somehow - but it is telling that none ofyour examples do it in such a straightforward way.

$ echo 'Ævar' | perl -ne 'utf8::decode($_); print $_ if /[Ss\x{DF}]/u'
$ echo 'baß' | perl -MEncode -ne 'utf8::decode($_); print
encode_utf8($_) if /[Ss\x{DF}]/u'
baß
$ echo 'Ævar' | perl -MEncode -ne 'utf8::decode($_); print
encode_utf8($_) if /[Ss\x{C6}]/u'
Ævar
$ echo 'Ævar' | perl -MEncode -ne 'utf8::decode($_); print
encode_utf8($_) if /[Ss\x{e6}]/ui'
Ævar

[Prev in Thread]

Current Thread

[Next in Thread]

bug#60690: -P '\d' in GNU and git grep, (continued)
- bug#60690: -P '\d' in GNU and git grep, Carlo Arenas, 2023/04/04
  - bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/04
    - bug#60690: -P '\d' in GNU and git grep, Junio C Hamano, 2023/04/04
    - bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/05
    - bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/05
    - bug#60690: -P '\d' in GNU and git grep, Junio C Hamano, 2023/04/05
    - bug#60690: -P '\d' in GNU and git grep, Jim Meyering, 2023/04/05
    - bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/05
    - bug#60690: -P '\d' in GNU and git grep, Carlo Arenas, 2023/04/05
    - bug#60690: -P '\d' in GNU and git grep, demerphq, 2023/04/06
    - bug#60690: -P '\d' in GNU and git grep, Paul Eggert <=
    - bug#60690: -P '\d' in GNU and git grep, demerphq, 2023/04/06
    - bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/07
    - bug#60690: -P '\d' in GNU and git grep, Carlo Arenas, 2023/04/08
    - bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/08

Prev by Date: bug#60690: -P '\d' in GNU and git grep
Next by Date: bug#60690: -P '\d' in GNU and git grep
Previous by thread: bug#60690: -P '\d' in GNU and git grep
Next by thread: bug#60690: -P '\d' in GNU and git grep
Index(es):
- Date
- Thread