|
From: | Markus Mützel |
Subject: | [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input |
Date: | Wed, 23 Oct 2019 16:43:44 -0400 (EDT) |
User-agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0 |
Follow-up Comment #3, bug #57107 (project octave): There is at least one modern OS that still uses 8bit encodings by default: Windows 10 and its predecessors. On a western locale the default encoding might well be ISO-8859-1 (or ANSI/CP1252). But I now see that this bug is marked as affecting GNU/Linux. So it will most probably be necessary to specify the encoding when fopen'ing a file for reading strings. Matlab's internal encoding is 16bit wide (maybe UCS-2). Maybe it reads the non-UTF-8 bytes as is and they "happen" to map the Unicode code points (for a western encoded file). I am not sure whether we should do something similar and transcode from a default 8bit encoding if we detect that a source contains invalid UTF-8. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?57107> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
[Prev in Thread] | Current Thread | [Next in Thread] |