gcl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] utf8 and emacs text/string multibyte representation


From: Camm Maguire
Subject: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Date: Thu, 30 Oct 2014 10:16:15 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Greetings!  Don't worry -- I'm not committed to this idea yet, just
exploring!

Do these other lisps allocate a fresh character on each aref?  Do they
maintain some ~2^21 sized table in core?  (And isn't emacs a "lisp"
:-)).

Take care,

Raymond Toy <address@hidden> writes:

>>>>>> "Camm" == Camm Maguire <address@hidden> writes:
>
>     Camm> Greetings!  I've recently been considering supporting unicode in 
> gcl by
>     Camm> representing strings internally in utf8.  It appears that emacs 
> does the
>     Camm> same or similar.  Apart from the obvious memory footprint benefits, 
> I'd
>     Camm> like to ask what other advantages/disadvantages have been 
> discovered.
>     Camm> Much of the utf8 literature emphasizes that most algorithms can 
> proceed
>     Camm> conventionally in byte-wise fashion, including lexicographical 
> ordering
>     Camm> comparisons, given that almost all jobs are sequential, at least
>     Camm> initially.  A cached internal pointer storing the last referenced
>     Camm> codepoint offset makes access essentially O(1).  Yet setting string
>     Camm> elements can trigger reallocations/memmove operations.  While these 
> can
>     Camm> be aggregated over the setting of multiple elements, operations like
>     Camm> nreverse look ridiculous if left in terms of calls to aref and aset.
>
>     Camm> Thoughts, advice and experiences most appreciated.
>
> Have you looked at what other Lisp implementations do? AFAIK, none use
> utf-8. CCL and clisp use utf-32, cmucl and allegro use utf-16, sbcl
> and ecl(?) have two string types: 8-bit base-string and 32-bit
> strings.
>
> As a one-man operation (unfortunately), I'd go with the easiest one to
> get right and follow either ccl or cmucl.  The rest of the support for
> unicode can be added with libraries like cl-unicode and/or babel, if
> need be.
>
> --
> Ray
>
>
> _______________________________________________
> Gcl-devel mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/gcl-devel
>
>
>
>

-- 
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]