pika-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pika-dev] so... string work


From: Tom Lord
Subject: [Pika-dev] so... string work
Date: Thu, 22 Jan 2004 09:58:50 -0800 (PST)


So, to sum up the implications for Pika of that long thing:

Strings will be "vtable objects" containing a pointer to a `t_udstr'
(see src/hackerlab/strigns).

`t_udstr's hold strings with an explicitly recorded length (measured
in encoding values -- e.g., bytes for utf-8, int16s for utf-16) and an
explicitly recorded encoding form (e.g. "this is a utf-8 string").

I want pika to use these encodings in such a way that Scheme strings 
(mostly) use the narrowest representation they can without needing to
encode a character as more than one encoding unit.   So, a 7-bit ASCII
string will usually be in UTF-8 (one byte per character) and most
other strings will be in UTF-16 (one uint16 per character) and some
strings (e.g., those containing characters with buckybits set;  those
containing Unicde characters outside the "basic multilingual plane")
will be stored in UTF-32.

So what needs to be done in libhackerlab is:

~ add uni_utf32 to the list of encoding forms
~ extend t_udstr for utf32
~ write test code for t_udstr
~ make sure there's enough in libhackerlab to implement the standard
  Scheme string procedures

and in Pika:

~ implement the representation for strings
~ implement the string primitives
~ write some tests


Any of that grab you as something you'd like to work on?

I should warn that the C macrology and inlining foo for adding UTF-32
is a little bit twisted.   I find that when I change it takes me a
while to figure out where I put everything :-).    But it's not as bad
as it might look at first.

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]