[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#60690: -P '\d' in GNU and git grep
From: |
Carlo Arenas |
Subject: |
bug#60690: -P '\d' in GNU and git grep |
Date: |
Fri, 7 Apr 2023 22:01:14 -0700 |
On Fri, Apr 7, 2023 at 12:00 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
>
> On 2023-04-06 06:39, demerphq wrote:
>
> > Unicode specifies that \d match any digit
> > in any script that it supports.
>
> "Specifies" is too strong. The Unicode Regular Expressions technical
> standard (UTS#18) mentions \d only in Annex C[1], next to the word
> "digit" in a column labeled "Property" (even though \d is really syntax
> not a property). This is at best an informal recommendation, not a
> requirement, as UTS#18 0.2[2] says that UTS#18's syntax is only for
> illustration and that although it's similar to Perl's, the two syntax
> forms may not be exactly the same. So we can't look to UTS#18 for a
> definitive way out of the \d mess, as the Unicode folks specifically
> delegated matters to us.
>
> Even ignoring the \d issue the digit situation is messy. UTS#18 Annex C
> says "\p{gc=Decimal_Number}" is the standard recommended syntax
> assignment for digits. However, PCRE2 does not support this syntax; it
> supports another variant \p{Nd} that UTS#18 also recommends. So it
> appears that PCRE2 already does not implement every recommended aspect
> of UTS#18 syntax. PCRE2 also doesn't match Perl, which does support
> "\p{gc=Decimal_Number}".
Not sure I follow the whole logic here, but PCRE2[3] (search for
"general category" which is what the "gc" above stands for) only
supports the abbreviated form of the unicode classes and `Nd` is
indeed the one that corresponds to `Decimal_Number`.
Carlo
[1]: https://unicode.org/reports/tr18/#Compatibility_Properties
[2]: https://unicode.org/reports/tr18/#Conformance
[3]: https://pcre2project.github.io/pcre2/doc/html/pcre2pattern.html
- bug#60690: -P '\d' in GNU and git grep, (continued)
- bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/05
- bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/05
- bug#60690: -P '\d' in GNU and git grep, Junio C Hamano, 2023/04/05
- bug#60690: -P '\d' in GNU and git grep, Jim Meyering, 2023/04/05
- bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/05
- bug#60690: -P '\d' in GNU and git grep, Carlo Arenas, 2023/04/05
- bug#60690: -P '\d' in GNU and git grep, demerphq, 2023/04/06
- bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/07
- bug#60690: -P '\d' in GNU and git grep, demerphq, 2023/04/06
- bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/07
- bug#60690: -P '\d' in GNU and git grep,
Carlo Arenas <=
- bug#60690: -P '\d' in GNU and git grep, Paul Eggert, 2023/04/08