gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] utf8 and emacs text/string multibyte representation


From: Camm Maguire
Subject: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Date: Sat, 01 Nov 2014 10:50:48 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Greetings!

Carl Shapiro <address@hidden> writes:

> On Fri, Oct 31, 2014 at 11:20 AM, Camm Maguire <address@hidden> wrote:
>
>     It really appears that unicode refers more to a glyph than anything
>     else.  If we follow your suggestions, and leave characters 8-bit, aref
>     random O(1) access, is there any utility to providing unicode functions
>     #'glyph-length or some such in a common lisp implementation?
>
> Yes, a Common Lisp character is a UTF-8 code unit.  As such, (length "א") 
> would return 2 in GCL whereas it returns 1 in CMUCL.
>
> For iterating across strings in ways other than by UTF-8 code unit, you will 
> want to provide an iterators for iterating by code point, by glyph,
> and so forth.
>
> In theory, something like CL-UNICODE would provide that but I think its 
> really lacking in a number of important ways.  GCL being what it is, you
> could link against ICU and use their functions to start with.
>

Thanks so much for these tips.  They certainly seem to illuminate the
path forward.  Can't see how we could do better than icu.

To your knowledge, is there any objection to defining alpha-char-p as
including code-char's >= 128?

Take care,
-- 
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]