[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
From: |
Raymond Toy |
Subject: |
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation |
Date: |
Sat, 01 Nov 2014 13:42:49 -0700 |
User-agent: |
Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b34 (darwin) |
>>>>> "Camm" == Camm Maguire <address@hidden> writes:
Camm> Greetings, and thanks so much! I think we are converging...
Camm> 1) The proposal under consideration is due to Carl, that gcl's lisp
Camm> character still be governed by char-code-limit==256, i.e. equivalent
to
Camm> an uint8_t. aref/aset work the same for all types of arrays. This
lisp
Camm> character has no correspondence to a unicode character other than the
Camm> overlap in the ascii range. In some fashion, gcl would then provide
on
Camm> top of these primitives (unichar s i), etc. to get unicodes from utf8
Camm> encoded strings. These are not random access, but can be cached. So
Camm> (code-char #xa0) != no-break-space.
Have you considered the cost of making gcl really rather incompatible
with other CLs?
Having (code-char #xa0) not be no-break-space is going to have be
explained to users. I suspect mal-formed strings will be somewhat
common when someone accidentally stores a code-unit > 128 into a
string.
And why complicate thins with a cache? What was fairly simple now
depends on having a fast bug-free cache implementation.
--
Ray