help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Solved] RE: Differences between identical strings in Emacs lisp


From: Jürgen Hartmann
Subject: RE: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Tue, 7 Apr 2015 19:02:38 +0200

Thank you for your comments and your caring advises, Eli Zaretskii:

> May I ask why you need to mess with unibyte strings?  (Your original
> message doesn't seem to present a real problem, just something that
> puzzled you.)

That's right: I was trying to learn something about the basic Lisp data types
and their constants and, as a side effect, trying to understand some of these
"cryptic" read and write sequences that one sees in Emacs from time to time.
Doing so it was "\xBA" that unnoticeable lured me into the land of the
unicode strings. And being there, as you warn below, the confusion started.

First I thought that some hidden decoding based on some charsets or coding
systems occurs. But now--thanks to Pascal Bourguignon and you--I know the
enemy, or at least its name.

>> In seams that one can use "\u00BA" to achieve this in a string constant;
>> it
>> evaluates to a multibyte string containing the integer 186:
>>
>>    "\u00BA"
>>    --> "º"
>
> Why can't you simply use the º character? why do you need to use its
> codepoint?

Of course this would be possible. As said above, the focus here lies in the
rather abstract Lisp topic, namely the conversion a hex code-point to a
string.

>> ... For example the constant "\x3FFFBA" is an unibyte string
>> containing the integer 186:
>>
>>    "\x3FFFBA"
>>    --> "\272"
>
> "Contains" is incorrect here.  That constant _represents_ a raw byte
> whose value is 186.  Emacs goes out of its way under the hood to show
> you 186 when the buffer or string contains 0x3FFFBA.

What is the correct parlance here: Is it correct to say that the constant
"\x3FFFBA\x3FFFBB\x3FFFBC" is not a string because it does not contain (?)
any characters; rather it is just a sequence of raw bytes?

>> ...
>> This seems to be an undocumented feature.
>
> It's barely documented in the node "Text Representations" in the ELisp
> manual.

I knew that, and that the range [#x3FFF80..#x3FFFFF] of code-points is used
for the multibyte representation of raw bytes I learned from section "32.3
Converting Text Representations". My surprise concerning the behavior of
"\x3FFFBA" refers to the fact, that it is a unibyte string--from the sentence
"But beware:..." in section "2.3.8.2 Non-ASCII Characters in Strings" of the
ELisp manual I thought it would be different. (But this was just my faulty
interpretation.)

> This is a tricky issue, so you are well advised to stay away of
> unibyte strings as much as you can, for your sanity's sake.

It was not my fault--"\xBA" is the bad guy.

>> ...
>
> Don't try to learn about unibyte/multibyte strings using ASCII
> characters as examples, because ASCII is treated specially for obvious
> reasons.

Okay.

> ...
>
> Yes, and therefore you don't need to consider the multibyte property.
>
>> ...
>
> As they should: you are comparing a character with a raw byte.
>
>> ... definition of the term character according to which a character
>> actually
>> _is_ that integer (cf. lisp manual, section "2.3.3 Character Type").
>
> It is an integer, but note that no one told you anywhere that a raw
> byte is a character.  It's a raw byte.

Ah, that seems to be the key: raw bytes are not characters. (Up to now I
thought that raw bytes are a special set of characters that have different
representations in unibyte and multibyte contexts.) This distinction removes
all the apparent ambiguities.

In spite of my previous promise not to try to learn something about the
unibyte/multibyte topic from ASCII, I shily dare to ask another question in
this context (don't beat me): Does the A in the unibyte string "A" represent
a character or a raw byte? Or both? In the latter case, is this that special
treatment of ASCII you talked about before?

> I'd still suggest that you try as much as you can not to use unibyte
> strings in your Lisp applications.  That way lies madness.

I will try to follow that advice--and I hope that it is not too late...

So, thank you very much for your enlightening answers.

Juergen

                                          


reply via email to

[Prev in Thread] Current Thread [Next in Thread]