|
From: | Markus Mützel |
Subject: | [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input |
Date: | Thu, 24 Oct 2019 10:17:45 -0400 (EDT) |
User-agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0 |
Follow-up Comment #10, bug #57107 (project octave): After a little research, I don't think that we should sniff the encoding. Instead we might want to select one of the fallback options for decoding invalid UTF-8 byte sequences [1]. I'd personally vote for the option: "The Unicode code points U+0080–U+00FF with the same value as the byte, thus interpreting the bytes according to ISO-8859-1." That also most closely matches what Matlab seems to be doing. And it would also solve the OR. [1]: https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?57107> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
[Prev in Thread] | Current Thread | [Next in Thread] |