|
From: | Paul Eggert |
Subject: | bug#62983: workaround PCRE2 bug affecting at least \D and \W |
Date: | Fri, 21 Apr 2023 11:42:50 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 |
On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote:
All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on its JIT implementation that results in failure to match for the negative perl classes, and seems to be easier to replicate when the matching character is a multibyte one.
Unfortunately that is a little vague. I expect the issue is not limited to \D and \W, as there are other ways to specify negative Perl classes. And if the bug merely seems to be easier to replicate with multibyte characters, it sounds like we may have issues even when matching ASCII characters in a UTF-8 locale.
Furthermore, I'm leery of optimizing for PCRE2 10.42 and earlier. We should focus our optimization efforts on future PCRE2 versions, and not worry about optimizing earlier versions where optimizations complicate maintenance for a declining benefit, and are likely to provoke bugs in older versions that as time passes will be harder to debug.
Alternatively JIT could be disabled instead, but the option selected has less of an impact on performance.
Disabling JIT sounds better, as correctness trumps performance. Until the bug is fixed (or at least better-understood so that we have a workaround we can trust), how about the attached patch instead?
0001-grep-use-PCRE2-JIT-only-in-unibyte-locales.patch
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |