octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input
Date: Mon, 19 Jun 2023 02:46:39 -0400 (EDT)

Follow-up Comment #40, bug #57107 (project octave):


> if the pattern to be matched is ASCII only, does regexp really care if the
input must be a valid string? can you give me a counter example when it
matters?

See a slightly modified example from comment #36:

regexp('aäiöü', '[aö]')


In Octave 4.4.1:

>> test_regexp_utf8
ans =

   1   2   5   6   7


In Octave 6.4.0 (I don't recall the exact version where this was fixed):

>> test_regexp_utf8

ans =

   1   5


The result before the related change was clearly wrong.

> is matching MATLAB's function behavior no longer a priority for octave
development?

The *function* itself is Matlab compatible. The relevant difference is that
char-arrays in Matlab are UTF-16 encoded. In Octave, they are UTF-8 encoded
byte arrays.
To get the exact behavior in both programs, the internal type for char-arrays
would need to change in Octave. That would be a major change that might have
unintended impact in many existing code. It might be better to discuss this
possible change on discourse to reach a wider audience of developers and
users:
https://octave.discourse.group/

Please open a thread there if you think transitioning to UTF-16 encoded
char-arrays is worth the possible risk of breaking existing code that was
written with the current representation of char arrays in mind.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]