octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #65524] Editing text in the command window is


From: Charles Praplan
Subject: [Octave-bug-tracker] [bug #65524] Editing text in the command window is unstable when non-ASCII characters are used
Date: Wed, 27 Mar 2024 13:35:23 -0400 (EDT)

Follow-up Comment #2, bug #65524 (group octave):

Great! I'm looking forward this improvement.

As you are working on the UTF-8 implementation, I mention here another small
thing I saw, concerning indexing in strings.
I'm curious to here your opinion about this.

This problem is independent of the encoding !
Let's consider and execute the following code: 

str1='12345.03 cm, 12345.03 cm';
str2='12345.04 µm, 12345.03 µm';

length(str1)
length(str2)

ans = 24
ans = 26

Although the two strings have the same length in terms of characters, Octave
returns the length in terms of bytes.

Now if I want to get the 2nd occurence of the units (cm or µm) contained in
the strings I have to select different start position but also different size,
i.e. the size is dependent on the content and it is likely I have to decode
UTF-8 myself to be able to isolate correctly a particular character.

str1(23 : 23 + 1)
str2(24 : 24 + 1)
str2(24 : 24 + 2)

ans = cm
ans = µ
ans = µm

Note that the same behaviour occurs in Matlab. However, due to the UTF-16
encoding this case occurs probabely very rarely, (not a problem for the above
example).
An example of a Matlab string with a character coded with 2*16bits is given
heraafter.

   test_str_for_ML=['1234', char(hex2dec('D834')),
char(hex2dec('DD1E')),',7890']




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65524>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]