bug-idutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-idutils] JavaScript support and non-ASCII identifiers


From: Jim Blandy
Subject: [bug-idutils] JavaScript support and non-ASCII identifiers
Date: Thu, 13 Sep 2012 14:44:31 -0700

[apologies if this isn't the right list; please redirect if that's the case]

I've started toying with adding JavaScript support to idutils. The
JavaScript grammar is defined in terms of a stream of UTF-16 code
units (not, unfortunately, in terms of Unicode code points), and JS
identifiers can contain non-ASCII characters. What kind of 'struct
token' should I return for that? Is there a defined encoding for
non-ASCII characters in the ID database?

If we elect to use UTF-8 in ID databases, then we'll need to depend on
something like iconv to convert to and from the locale's current
encoding --- assuming that the files read are using that.

If we elect to use the locale's coded character set in ID databases,
then interpreting a database's contents correctly will depend on the
coded character set being the same as it was when the database was
created, which seems unfortunate. The JavaScript scanner would still
need to use iconv to get the UTF-16 stream it needs, so this approach
won't avoid introducing a dependency on iconv.

For now, I'm going to punt on non-ASCII characters, treating them all
as identifier components.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]