[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in
From: |
Markus Mützel |
Subject: |
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input |
Date: |
Mon, 19 Jun 2023 02:46:39 -0400 (EDT) |
Follow-up Comment #40, bug #57107 (project octave):
> if the pattern to be matched is ASCII only, does regexp really care if the
input must be a valid string? can you give me a counter example when it
matters?
See a slightly modified example from comment #36:
regexp('aäiöü', '[aö]')
In Octave 4.4.1:
>> test_regexp_utf8
ans =
1 2 5 6 7
In Octave 6.4.0 (I don't recall the exact version where this was fixed):
>> test_regexp_utf8
ans =
1 5
The result before the related change was clearly wrong.
> is matching MATLAB's function behavior no longer a priority for octave
development?
The *function* itself is Matlab compatible. The relevant difference is that
char-arrays in Matlab are UTF-16 encoded. In Octave, they are UTF-8 encoded
byte arrays.
To get the exact behavior in both programs, the internal type for char-arrays
would need to change in Octave. That would be a major change that might have
unintended impact in many existing code. It might be better to discuss this
possible change on discourse to reach a wider audience of developers and
users:
https://octave.discourse.group/
Please open a thread there if you think transitioning to UTF-16 encoded
char-arrays is worth the possible risk of breaking existing code that was
written with the current representation of char arrays in mind.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?57107>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/15
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/18
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/18
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input,
Markus Mützel <=
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19