[MIT-Scheme-devel] symbol names

mit-scheme-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MIT-Scheme-devel] symbol names

From:	Taylor R Campbell
Subject:	[MIT-Scheme-devel] symbol names
Date:	Sat, 25 Jun 2011 16:33:13 +0000
User-agent:	IMAIL/1.21; Edwin/3.116; MIT-Scheme/9.1

Currently a symbol's name is a Unicode string, represented internally
by a vector-8b of UTF-8.  STRING->SYMBOL and SYMBOL->STRING implement
a bijection between the set of vector-8bs of ISO-8859-1 and a subset
of the set of symbols.  STRING->SYMBOL is injective, but it is not a
surjection onto the set of (interned) symbols, and SYMBOL->STRING is
not defined on the whole set of symbols.

A lot of the system expects SYMBOL->STRING to work on all symbols,
however, and interprets the vector-8b from SYMBOL-NAME as ISO-8859-1
data when it is meant to be interpreted as UTF-8 data.  This is kinda
frustrating.  For example, KEYWORD? explodes if you pass it a symbol
whose name has any code points outside the ISO-8859-1 range; and as a
consequence, the printer explodes too if you try to print one.  This
particular problem is easy to fix, but I don't know what others are
lurking.

It seems that the only user of non-ISO-8859-1 symbols is XML names.  I
suppose that's convenient for using the XML library, but the
conveience is limited to symbols whose names lie in ISO-8859-1 because
we don't support source code encoded otherwise than in ISO-8859-1.

I don't have a particular suggestion for what to do, but this seems
wrong.

[Prev in Thread]

Current Thread

[Next in Thread]

[MIT-Scheme-devel] symbol names, Taylor R Campbell <=
- Re: [MIT-Scheme-devel] symbol names, Arthur A. Gleckler, 2011/06/25

Prev by Date: [MIT-Scheme-devel] bookkeeping memory allocated per primitive
Next by Date: Re: [MIT-Scheme-devel] bookkeeping memory allocated per primitive
Previous by thread: [MIT-Scheme-devel] bookkeeping memory allocated per primitive
Next by thread: Re: [MIT-Scheme-devel] symbol names
Index(es):
- Date
- Thread