octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu


From: Andrew Janke
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input
Date: Thu, 24 Oct 2019 08:28:24 -0400 (EDT)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0

Follow-up Comment #8, bug #57107 (project octave):

> So there are no "non-UTF-8 byte values" and byte values between 128-255 can
directly be mapped to UTF-16 (in an effective no-op).

There are non-UTF-8 byte values; it's that UTF-8 isn't involved and it happens
to read as correct UCS-2 if you just widen the bytes to 16 bits as unsigned
ints (yeah, in an effective no-op).

> I was assuming from your comment #2 that the default encoding used by Matlab
on non-Windows systems was UTF-8. But if I follow you correctly in your
comment #4, it is ISO-8859-1?

Yes, I believe it's possible. It's not in the doco, so would need to actually
do testing on Matlab to verify, which I'm unwilling to do for licensing
reasons.

And if this is the case, we need to decide whether Octave should do the same
thing for Matlab compatibility, or do something different, because IMHO that's
a _really bad_ default behavior. For example, if we did it that way (default
ISO-8859-1 everywhere), it would probably break @mleitner's basic use case
that he's concerned about here. I think it would be better for almost all
users and scenarios if Octave would act like a normal Unix program and take
the default encoding from the process's locale.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]