octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 inpu


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #57107] regexp functions fail on ISO-8859 input
Date: Wed, 23 Oct 2019 16:43:44 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0

Follow-up Comment #3, bug #57107 (project octave):

There is at least one modern OS that still uses 8bit encodings by default:
Windows 10 and its predecessors.
On a western locale the default encoding might well be ISO-8859-1 (or
ANSI/CP1252).

But I now see that this bug is marked as affecting GNU/Linux. So it will most
probably be necessary to specify the encoding when fopen'ing a file for
reading strings.

Matlab's internal encoding is 16bit wide (maybe UCS-2). Maybe it reads the
non-UTF-8 bytes as is and they "happen" to map the Unicode code points (for a
western encoded file).
I am not sure whether we should do something similar and transcode from a
default 8bit encoding if we detect that a source contains invalid UTF-8.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57107>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]