[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
From: |
Camm Maguire |
Subject: |
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation |
Date: |
Wed, 29 Oct 2014 11:55:13 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) |
Greetings, and thanks so much for the feedback!
Eli Zaretskii <address@hidden> writes:
>> From: Camm Maguire <address@hidden>
>> Date: Wed, 29 Oct 2014 10:04:58 -0400
>>
> You have basically said it yourself: memory footprint vs
> addressability. If you want to discuss this in more detail, I suggest
> to ask more specific questions about specific aspects that bother you.
>
I thought there would be a little more on the upside, say some benefit
from having the internal representation be the same as that used in many
external representations, at least on linux, and perhaps some algorithm
coalescing with straightforward byte-wise operations. Does every string
access in emacs proceed through the utf8 decoder?
>> A cached internal pointer storing the last referenced codepoint
>> offset makes access essentially O(1).
>
> We indeed maintain a cache for byte-to-character and character-to-byte
> conversions.
How big is this cache?
>
>> Yet setting string elements can trigger reallocations/memmove
>> operations.
>
> Emacs, as every editor, needs to handle this efficiently anyway,
> because editing operations rarely leave the buffer size unchanged. So
> Emacs uses a gap to minimize reallocations.
>
But no gap in strings, right (i.e. just buffers)?
>> While these can be aggregated over the setting of multiple elements,
>> operations like nreverse look ridiculous if left in terms of calls
>> to aref and aset.
>
> nreverse applied to a string is a rarity, IME.
>
This is the stuff I really need to get a handle on -- what are the
dominant string operations.
Take care,
--
Camm Maguire address@hidden
==========================================================================
"The earth is but one country, and mankind its citizens." -- Baha'u'llah
Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy, 2014/10/29
[Gcl-devel] utf8 and emacs text/string multibyte representation, Stephen J. Turnbull, 2014/10/29