[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[MIT-Scheme-devel] symbol names
From: |
Taylor R Campbell |
Subject: |
[MIT-Scheme-devel] symbol names |
Date: |
Sat, 25 Jun 2011 16:33:13 +0000 |
User-agent: |
IMAIL/1.21; Edwin/3.116; MIT-Scheme/9.1 |
Currently a symbol's name is a Unicode string, represented internally
by a vector-8b of UTF-8. STRING->SYMBOL and SYMBOL->STRING implement
a bijection between the set of vector-8bs of ISO-8859-1 and a subset
of the set of symbols. STRING->SYMBOL is injective, but it is not a
surjection onto the set of (interned) symbols, and SYMBOL->STRING is
not defined on the whole set of symbols.
A lot of the system expects SYMBOL->STRING to work on all symbols,
however, and interprets the vector-8b from SYMBOL-NAME as ISO-8859-1
data when it is meant to be interpreted as UTF-8 data. This is kinda
frustrating. For example, KEYWORD? explodes if you pass it a symbol
whose name has any code points outside the ISO-8859-1 range; and as a
consequence, the printer explodes too if you try to print one. This
particular problem is easy to fix, but I don't know what others are
lurking.
It seems that the only user of non-ISO-8859-1 symbols is XML names. I
suppose that's convenient for using the XML library, but the
conveience is limited to symbols whose names lie in ISO-8859-1 because
we don't support source code encoded otherwise than in ISO-8859-1.
I don't have a particular suggestion for what to do, but this seems
wrong.
- [MIT-Scheme-devel] symbol names,
Taylor R Campbell <=