octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in


From: Qianqian Fang
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input
Date: Mon, 19 Jun 2023 14:43:35 -0400 (EDT)

Follow-up Comment #46, bug #57107 (project octave):


> That's the behavior of PCRE(2). Octave only implemented a manual check
because PCRE's own check was slow (at least at some point).

I have no idea if Perl's own regex has anything to do with PCRE2, but it has
no issue guzzling anything I throw to it regardless if it is valid utf-8 or
not.

Here is the output from Perl 5.34.0 on xubuntu 22.04 under en_US.UTF-8 locale


$ perl -e 'print join(" ", split(/(..)/, (unpack "H*",
"aäiöü\xFF\x00a中")))."\n"'
 61  c3  a4  69  c3  b6  c3  bc  ff  00  61  e4  b8  ad

$ perl -e 'print $-[0]."\n" while("aäiöü\xFF\x00a中" =~ /(中|ä)/g);'
1
11

$ perl -e 'print $-[0]."\n" while("aäiöü\xFF\x00a中" =~ /[ai]/g);'
0
3
10


also, to match a multi-byte character, I rarely see one putting it inside
`[]`, the `/(中|ä)/` is more appropriate to avoid the issue you are seeing.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]