|
From: | Paul Eggert |
Subject: | bug#60690: -P '\d' in GNU and git grep |
Date: | Fri, 7 Apr 2023 09:48:40 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 |
On 2023-04-06 08:45, demerphq wrote:
Although this causes pcre2grep to mishandle Unicode characters: $ echo 'Ævar' | pcre2grep '[Ssß]' Ævar it mimics Perl 5.36: $ echo 'Ævar' | perl -ne 'print $_ if /[Ssß]/' Ævar so this seems to be what Perl users expect, despite its infelicities.Actually no, I think you have misunderstood what is happening at the different layers involved here.
No, I understood what was going on. My point was that Perl users seem to have accepted this behavior, even though it does not match what people would ordinarily expect.
What you should have done is something like this:
No, for two reasons. First, I'm no Perl expert and so I don't know (and don't particularly want to learn) its complicated Unicode options and calls. Second, /[Ss\x{DF}]/u is hard to read. If I want the S letters of traditional German, I'll write them in the obvious way, as [Ssß]. No doubt Perl will let me do this somehow - but it is telling that none of your examples do it in such a straightforward way.
$ echo 'Ævar' | perl -ne 'utf8::decode($_); print $_ if /[Ss\x{DF}]/u' $ echo 'baß' | perl -MEncode -ne 'utf8::decode($_); print encode_utf8($_) if /[Ss\x{DF}]/u' baß $ echo 'Ævar' | perl -MEncode -ne 'utf8::decode($_); print encode_utf8($_) if /[Ss\x{C6}]/u' Ævar $ echo 'Ævar' | perl -MEncode -ne 'utf8::decode($_); print encode_utf8($_) if /[Ss\x{e6}]/ui' Ævar
[Prev in Thread] | Current Thread | [Next in Thread] |