octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in


From: Qianqian Fang
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input
Date: Mon, 19 Jun 2023 12:27:05 -0400 (EDT)

Follow-up Comment #42, bug #57107 (project octave):

just to use your example, instead of searching `[aö]`, if the pattern is
`[ai]`, I am getting consistent answers on both Octave 4.2.2 and 6.4


octave:1> regexp('aäiöü', '[ai]')
ans =

   1   4


I understand that regexp is trying to handle multi-byte characters
consistently, but throwing an error is an overly aggressive response when
legacy input is provided. In the past, the caller is responsible to examine
input encoding and interpret the output accordingly. Giving this manual
encoding handling capability back to programmers makes regexp a life saver for
many complex tasks beyond merely string pattern matching. As someone coming
from a Perl background, I use regex as a core part of application
development.


if restoring the old behavior is not possible, at least can the following
options be considered?

1. instead of throwing an error, can regexp give an warning yet still proceed
with the old behavior?

2. if 1 is not possible, can regexp test if the matching pattern contains no
multi-byte characters and if yes, ignore utf-8 string restriction of the
input?

3. add an octave-specific option to regexp/regexpi/regexprep to allow manual
encoding handling (and ignore utf-8 input restriction)?


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]