[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu
From: |
Andrew Janke |
Subject: |
[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input |
Date: |
Thu, 24 Oct 2019 14:26:35 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0 |
Follow-up Comment #12, bug #57107 (project octave):
That sounds like a pretty reasonable approach. I think it would provide "do
what I want" behavior for most users without getting too fancy, and would
provide decent Matlab compatibility.
Maybe we'd want to do a two-step fallback:
1. Default to UTF-8.
2. If encountering non-UTF-8 byte sequences,
a) If the the user's locale's encoding is a non-Unicode encoding, fall back
to it,
b) Else fall back to ISO-8859-1 like this.
I don't know if that's actually viable for all multibyte encodings, though
(e.g. like Shift-JIS). And I'm pretty sure it's not what Matlab does. But it
might be a better behavior for e.g. Eastern European, Arabic, or Thai users.
And we're only talking about what the default behavior should be when a file
handle is opened without an encoding specified, right? I would expect that
when using an explicitly requested encoding, invalid input would just raise an
error. (Unless the user explicitly asked for a fallback behavior somehow.)
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?57107>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, (continued)
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/23
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/23
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/23
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Michael Leitner, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input,
Andrew Janke <=
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Andrew Janke, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/24
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/25
- [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input, Markus Mützel, 2019/10/26