[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #65524] Editing text in the command window is
From: |
Charles Praplan |
Subject: |
[Octave-bug-tracker] [bug #65524] Editing text in the command window is unstable when non-ASCII characters are used |
Date: |
Wed, 27 Mar 2024 13:35:23 -0400 (EDT) |
Follow-up Comment #2, bug #65524 (group octave):
Great! I'm looking forward this improvement.
As you are working on the UTF-8 implementation, I mention here another small
thing I saw, concerning indexing in strings.
I'm curious to here your opinion about this.
This problem is independent of the encoding !
Let's consider and execute the following code:
str1='12345.03 cm, 12345.03 cm';
str2='12345.04 µm, 12345.03 µm';
length(str1)
length(str2)
ans = 24
ans = 26
Although the two strings have the same length in terms of characters, Octave
returns the length in terms of bytes.
Now if I want to get the 2nd occurence of the units (cm or µm) contained in
the strings I have to select different start position but also different size,
i.e. the size is dependent on the content and it is likely I have to decode
UTF-8 myself to be able to isolate correctly a particular character.
str1(23 : 23 + 1)
str2(24 : 24 + 1)
str2(24 : 24 + 2)
ans = cm
ans = µ
ans = µm
Note that the same behaviour occurs in Matlab. However, due to the UTF-16
encoding this case occurs probabely very rarely, (not a problem for the above
example).
An example of a Matlab string with a character coded with 2*16bits is given
heraafter.
test_str_for_ML=['1234', char(hex2dec('D834')),
char(hex2dec('DD1E')),',7890']
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?65524>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/