[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in
From: |
Qianqian Fang |
Subject: |
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input |
Date: |
Mon, 19 Jun 2023 14:43:35 -0400 (EDT) |
Follow-up Comment #46, bug #57107 (project octave):
> That's the behavior of PCRE(2). Octave only implemented a manual check
because PCRE's own check was slow (at least at some point).
I have no idea if Perl's own regex has anything to do with PCRE2, but it has
no issue guzzling anything I throw to it regardless if it is valid utf-8 or
not.
Here is the output from Perl 5.34.0 on xubuntu 22.04 under en_US.UTF-8 locale
$ perl -e 'print join(" ", split(/(..)/, (unpack "H*",
"aäiöü\xFF\x00a中")))."\n"'
61 c3 a4 69 c3 b6 c3 bc ff 00 61 e4 b8 ad
$ perl -e 'print $-[0]."\n" while("aäiöü\xFF\x00a中" =~ /(中|ä)/g);'
1
11
$ perl -e 'print $-[0]."\n" while("aäiöü\xFF\x00a中" =~ /[ai]/g);'
0
3
10
also, to match a multi-byte character, I rarely see one putting it inside
`[]`, the `/(中|ä)/` is more appropriate to avoid the issue you are seeing.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?57107>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/15
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/18
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/18
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input,
Qianqian Fang <=