[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#62267: grep-3.9 bug: \d matches multibyte digits
From: |
Paul Eggert |
Subject: |
bug#62267: grep-3.9 bug: \d matches multibyte digits |
Date: |
Sun, 19 Mar 2023 01:28:38 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 |
On 2023-03-18 23:33, Jim Meyering wrote:
By the way, have you ever used \D? I think I have not.
No, I'm not much of a Perl user these days (last seriously used it in
the 1990s...).
- char *new_keys = xnmalloc (len / 2 + 1, 5);
+ char *new_keys = xnmalloc (len / 2 + 1, 6);
This could be xnmalloc (len + 1, 3).
Or if you want to show the work, you can replace it with something like:
int origlen = sizeof "\\D" - 1;
int repllen = sizeof "[^0-9]" - 1;
int expansion = repllen / origlen + (repllen % origlen != 0);
char *new_keys = xnmalloc (len + 1, expansion);
(Isn't memory allocation fun? :-)
Doesn't Perl have the same issue?
Oh, you're right. Not being a Perl expert, all I did was run this:
echo '٠١٢٣٤٥٦٧٨٩' | perl -ne 'print if /\d/'
and I observed no output. However, I now see that I need to use perl's
-C option too, to get the kind of regular-expression behavior that plain
grep has.
Looking at the source code again, how about if we move the PCRE-specific
changes from src/grep.c to src/pcresearch.c which is where it really
belongs, and more importantly use the bleeding-edge
PCRE2_EXTRA_ASCII_BSD macro if available?
Something like the attached patch, say. This patch doesn't take your \D
fixes (or the above suggestions) into account.
0001-grep-forward-port-to-PCRE2-10.43.patch
Description: Text Data
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Jim Meyering, 2023/03/18
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Paul Eggert, 2023/03/18
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Jim Meyering, 2023/03/19
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Jim Meyering, 2023/03/19
- bug#62267: grep-3.9 bug: \d matches multibyte digits,
Paul Eggert <=
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Paul Eggert, 2023/03/19
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Jim Meyering, 2023/03/19
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Jim Meyering, 2023/03/19
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Paul Eggert, 2023/03/19
- bug#62267: grep-3.9 bug: \d matches multibyte digits, Jim Meyering, 2023/03/19