[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
From: |
Camm Maguire |
Subject: |
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation |
Date: |
Sat, 01 Nov 2014 10:50:48 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) |
Greetings!
Carl Shapiro <address@hidden> writes:
> On Fri, Oct 31, 2014 at 11:20 AM, Camm Maguire <address@hidden> wrote:
>
> It really appears that unicode refers more to a glyph than anything
> else. If we follow your suggestions, and leave characters 8-bit, aref
> random O(1) access, is there any utility to providing unicode functions
> #'glyph-length or some such in a common lisp implementation?
>
> Yes, a Common Lisp character is a UTF-8 code unit. As such, (length "א")
> would return 2 in GCL whereas it returns 1 in CMUCL.
>
> For iterating across strings in ways other than by UTF-8 code unit, you will
> want to provide an iterators for iterating by code point, by glyph,
> and so forth.
>
> In theory, something like CL-UNICODE would provide that but I think its
> really lacking in a number of important ways. GCL being what it is, you
> could link against ICU and use their functions to start with.
>
Thanks so much for these tips. They certainly seem to illuminate the
path forward. Can't see how we could do better than icu.
To your knowledge, is there any objection to defining alpha-char-p as
including code-char's >= 128?
Take care,
--
Camm Maguire address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation,
Camm Maguire <=