|
From: | Stefan Monnier |
Subject: | Re: [Solved] RE: Differences between identical strings in Emacs lisp |
Date: | Thu, 09 Apr 2015 08:32:09 -0400 |
User-agent: | Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) |
> I could imagine that the step from the equivalence char=byte to > char=unicode code point (long(er) integer) is not so difficult. But we have > in addition the UTF-8 representation. To what of the two latter--unicode code > point (integer, several bytes long) or its UTF-8 representation (sequence of > several bytes) does the term "multibyte" refer? multibyte refers to "string of characters". These have been represented internally using an iso-2022 encoding until Emacs-22 and since Emacs-23 they're represented internally with a utf-8 encoding. The name comes from the fact that each element can use up more than one byte. But that's just an internal detail that is mostly hidden from Elisp. To turn such a string of characters into a string of bytes you need to use things like encode-coding-(string|buffer), at which point you have to specify which encoding you want to use (e.g. utf-8). Stefan
[Prev in Thread] | Current Thread | [Next in Thread] |