octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu


From: Andrew Janke
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input
Date: Thu, 24 Oct 2019 14:26:35 -0400 (EDT)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0

Follow-up Comment #12, bug #57107 (project octave):

That sounds like a pretty reasonable approach. I think it would provide "do
what I want" behavior for most users without getting too fancy, and would
provide decent Matlab compatibility.

Maybe we'd want to do a two-step fallback:

1. Default to UTF-8.
2. If encountering non-UTF-8 byte sequences,
  a) If the the user's locale's encoding is a non-Unicode encoding, fall back
to it,
  b) Else fall back to ISO-8859-1 like this.

I don't know if that's actually viable for all multibyte encodings, though
(e.g. like Shift-JIS). And I'm pretty sure it's not what Matlab does. But it
might be a better behavior for e.g. Eastern European, Arabic, or Thai users.

And we're only talking about what the default behavior should be when a file
handle is opened without an encoding specified, right? I would expect that
when using an explicitly requested encoding, invalid input would just raise an
error. (Unless the user explicitly asked for a fallback behavior somehow.)

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]