octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input
Date: Thu, 24 Oct 2019 10:17:45 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0

Follow-up Comment #10, bug #57107 (project octave):

After a little research, I don't think that we should sniff the encoding.
Instead we might want to select one of the fallback options for decoding
invalid UTF-8 byte sequences [1].

I'd personally vote for the option:
"The Unicode code points U+0080–U+00FF with the same value as the byte, thus
interpreting the bytes according to ISO-8859-1."

That also most closely matches what Matlab seems to be doing. And it would
also solve the OR.

[1]: https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]