[Gcl-devel] utf8 and emacs text/string multibyte representation

gcl-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] utf8 and emacs text/string multibyte representation

From:	Camm Maguire
Subject:	[Gcl-devel] utf8 and emacs text/string multibyte representation
Date:	Wed, 29 Oct 2014 10:04:58 -0400
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Greetings!  I've recently been considering supporting unicode in gcl by
representing strings internally in utf8.  It appears that emacs does the
same or similar.  Apart from the obvious memory footprint benefits, I'd
like to ask what other advantages/disadvantages have been discovered.
Much of the utf8 literature emphasizes that most algorithms can proceed
conventionally in byte-wise fashion, including lexicographical ordering
comparisons, given that almost all jobs are sequential, at least
initially.  A cached internal pointer storing the last referenced
codepoint offset makes access essentially O(1).  Yet setting string
elements can trigger reallocations/memmove operations.  While these can
be aggregated over the setting of multiple elements, operations like
nreverse look ridiculous if left in terms of calls to aref and aset.

Thoughts, advice and experiences most appreciated.

Take care,
-- 
Camm Maguire                                        address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

[Prev in Thread]

Current Thread

[Next in Thread]

[Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire <=
- Message not available
  - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/29
    - Message not available
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
    - Message not available
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
    - Message not available
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
    - Message not available
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Stephen J. Turnbull, 2014/10/31
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy, 2014/10/29
  - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Sam Steingold, 2014/10/31
- [Gcl-devel] utf8 and emacs text/string multibyte representation, Stephen J. Turnbull, 2014/10/29

Prev by Date: Re: [Gcl-devel] Can't find some basic symbols in ANSI
Next by Date: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Previous by thread: [Gcl-devel] Can't find some basic symbols in ANSI
Next by thread: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Index(es):
- Date
- Thread