[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu

From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input
Date: Thu, 24 Oct 2019 10:17:45 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0

Follow-up Comment #10, bug #57107 (project octave):

After a little research, I don't think that we should sniff the encoding.
Instead we might want to select one of the fallback options for decoding
invalid UTF-8 byte sequences [1].

I'd personally vote for the option:
"The Unicode code points U+0080–U+00FF with the same value as the byte, thus
interpreting the bytes according to ISO-8859-1."

That also most closely matches what Matlab seems to be doing. And it would
also solve the OR.

[1]: https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences


Reply to this item at:


  Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]