octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 in


From: Qianqian Fang
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859-1 input
Date: Thu, 15 Jun 2023 17:10:24 -0400 (EDT)

Follow-up Comment #37, bug #57107 (project octave):

@mmuetzel, the example you gave is not the type of use cases that I need
regexp for - which is to match ascii-based patterns in an arbitrary (including
non-UTF-8) char-array.

I want to emphasize is that such use case is still supported by MATLAB (as
well as in Python - re module can match with or without the re.UNICODE flag),
however, it has been eliminated by newer octaves. This creates a function
behavioral discrepancy and potentially limits MATLAB toolbox authors from
porting their software to Octave. To me, it is a big loss of flexibility of
regexp.

For example, if I have an arbitrary char-array, I want to tell if the
char-array starts with an URL, I could use

regexpi(buffer,'^\s*(http|https|ftp|file)://')

to efficient match many possible protocols empowered by older versions of
regexp. Another example is that I read the first 256-bytes from a binary file
and want to test it's MAGIC headers
(https://en.wikipedia.org/wiki/List_of_file_signatures). With this feature
removed, I really don't see how to achieve goals like these in a compact,
extensible and versatile fashion.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]