[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
locale encoding and core functions
From: |
Markus Mützel |
Subject: |
locale encoding and core functions |
Date: |
Sat, 23 Feb 2019 10:12:56 +0100 |
TL;DR: Is there a way to get information whether an .m file is from Octave core
or from a user function?
Some background:
With the upcoming Octave 5 it will be possible to set the mfile_encoding that
will be used to read .m files. This is important because Octave has to know
which encoding is used in the .m file to correctly display non-ASCII characters
in strings (e.g. in the "workspace" view or in plots). This is done by
converting from whatever encoding the user set up to UTF-8 and convert to
whatever encoding necessary at any interfaces.
However, there is a problem when we read core .m files which are always encoded
in UTF-8 (and not in the encoding the user set up). On conversion of these
files from the locale encoding to UTF-8, non-ASCII characters result in garbled
text.
E.g. the German character "ä" encoded in UTF-8 is represented by two bytes: c3
a4. Assume that users would set the mfile_encoding to "ISO 8859-1" (Latin1).
Then these two bytes are interpreted as representing the two letters "ä". This
means that a string from a core .m file that contained the letter "ä" would
display as "ä" for those users.
None of the core .m files contain any non-ASCII characters at the moment.
However, there are a few help texts in some Octave Forge packages that do. See
also bug #55195 [1].
The conversion to UTF-8 is done in "file_reader::get_input" in the file
"input.cc".
If we knew in that function that the file we read from was from the core (or an
Octave Forge package), we could skip the conversion from the locale encoding to
mitigate the problem.
So back to the initial question: Is there a way to pass this information down
to that function?
Markus
PS: This problem mostly affects Windows users where the default mfile_encoding
depends on the locale of Windows (see also bug #49685). But in general any user
who would prefer to use an encoding other than UTF-8 in their .m file code
would be affected by this bug.
[1]: https://savannah.gnu.org/bugs/index.php?55195
[2]: https://savannah.gnu.org/bugs/index.php?49685
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- locale encoding and core functions,
Markus Mützel <=