[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in
From: |
Qianqian Fang |
Subject: |
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input |
Date: |
Thu, 15 Jun 2023 17:10:24 -0400 (EDT) |
Follow-up Comment #37, bug #57107 (project octave):
@mmuetzel, the example you gave is not the type of use cases that I need
regexp for - which is to match ascii-based patterns in an arbitrary (including
non-UTF-8) char-array.
I want to emphasize is that such use case is still supported by MATLAB (as
well as in Python - re module can match with or without the re.UNICODE flag),
however, it has been eliminated by newer octaves. This creates a function
behavioral discrepancy and potentially limits MATLAB toolbox authors from
porting their software to Octave. To me, it is a big loss of flexibility of
regexp.
For example, if I have an arbitrary char-array, I want to tell if the
char-array starts with an URL, I could use
regexpi(buffer,'^\s*(http|https|ftp|file)://')
to efficient match many possible protocols empowered by older versions of
regexp. Another example is that I read the first 256-bytes from a binary file
and want to test it's MAGIC headers
(https://en.wikipedia.org/wiki/List_of_file_signatures). With this feature
removed, I really don't see how to achieve goals like these in a compact,
extensible and versatile fashion.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?57107>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input,
Qianqian Fang <=
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/18
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/18
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Markus Mützel, 2023/06/19
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input, Qianqian Fang, 2023/06/19