mit-scheme-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MIT-Scheme-devel] symbol names


From: Taylor R Campbell
Subject: [MIT-Scheme-devel] symbol names
Date: Sat, 25 Jun 2011 16:33:13 +0000
User-agent: IMAIL/1.21; Edwin/3.116; MIT-Scheme/9.1

Currently a symbol's name is a Unicode string, represented internally
by a vector-8b of UTF-8.  STRING->SYMBOL and SYMBOL->STRING implement
a bijection between the set of vector-8bs of ISO-8859-1 and a subset
of the set of symbols.  STRING->SYMBOL is injective, but it is not a
surjection onto the set of (interned) symbols, and SYMBOL->STRING is
not defined on the whole set of symbols.

A lot of the system expects SYMBOL->STRING to work on all symbols,
however, and interprets the vector-8b from SYMBOL-NAME as ISO-8859-1
data when it is meant to be interpreted as UTF-8 data.  This is kinda
frustrating.  For example, KEYWORD? explodes if you pass it a symbol
whose name has any code points outside the ISO-8859-1 range; and as a
consequence, the printer explodes too if you try to print one.  This
particular problem is easy to fix, but I don't know what others are
lurking.

It seems that the only user of non-ISO-8859-1 symbols is XML names.  I
suppose that's convenient for using the XML library, but the
conveience is limited to symbols whose names lie in ISO-8859-1 because
we don't support source code encoded otherwise than in ISO-8859-1.

I don't have a particular suggestion for what to do, but this seems
wrong.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]