octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input
Date: Thu, 24 Oct 2019 08:15:15 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0

Follow-up Comment #6, bug #57107 (project octave):

I also don't really like the idea of sniffing the file.

The reasoning behind all of this: 
Let's assume a user is reading strings from a file. This in itself doesn't
require any knowledge of the used encoding. But if the user wants to use these
strings to open a file or folder from the file system or wants to place a
legend or annotation in a graph, encoding is important.
To try and remove all of that conversion hassle from the user, we are trying
to have all character arrays in Octave encoded consistently and only convert
at the interfaces. 
Some time ago it was decided that this consistent encoding should be UTF-8
(different from Matlab).

@Andrew:
I think we agree. But I can't explain myself well enough. I was assuming from
your comment #2 that the default encoding used by Matlab on non-Windows
systems was UTF-8. But if I follow you correctly in your comment #4, it is
ISO-8859-1? So there are no "non-UTF-8 byte values" and byte values between
128-255 can directly be mapped to UTF-16 (in an effective no-op).

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]