|
From: | Andrew Janke |
Subject: | Unicode support in io Forge package |
Date: | Fri, 18 Oct 2019 22:04:46 -0700 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 |
Hi, Octave and io maintainers,I'm confused by the Unicode support in the io package. In particular, the functions unicode2utf8 and utf82unicode, and the "encode_utf" options in some of the ods/xls read/write functions.
What is the encoding that utf82unicode/unicode2utf8 are calling "unicode" here? It looks like it's doing a single-byte encoding, treating each byte as an unsigned int 0-255, and treating those 0-255 values directly as Unicode code point values. That's not any of the standard Unicode encodings. (But I think it is exactly the same as Latin-1/ISO 8859-1.)
As I understand it, since about Octave 4.4, Octave's internal encoding (that is, how it interprets Octave char arrays) is either UTF-8 or an opaque array of bytes; it's never in the "system code page" or some other locale-specific encoding.
Is this UTF-8 support in io still relevant/correct? Maybe it should be deprecated or renamed/removed? Since Octave now supports UTF-8, I think you'd want to just leave UTF-8 text as is in all cases.
Cheers, Andrew
[Prev in Thread] | Current Thread | [Next in Thread] |