I think we have a plan using utf-8 in
our patterns that was suggested by Martin. All we need to do is allow
chars > 0x7f in an "identifier" (for example a,b,c, d below).
We have a utf-8 validator and a converter to utf-16 that we can have
flex call (or have bison call... I'm a newbie on this). We just need
the raw bytes from flex for an identifier token.
Chuck
----- Forwarded by Chuck
Carmack/Rochester/IBM on 03/30/2004 10:58 AM -----
Hans Aberg <address@hidden>
03/30/2004 10:38 AM
To
Chuck Carmack/Rochester/address@hidden
cc
address@hidden
Subject
Re: Fw: Does flex support
UTF-8
At 16:48 -0600 2004/03/29, Chuck Carmack wrote:
>
>Hi Martin:
>
>We'd like to create a parser to parse SQL-like statements. For
example,
>
>SELECT a FROM b WHERE c = d
>
>where a,b,c, and d can be in unicode. Our idea was to allow utf-8
for
>those, but ucs-2 or utf-16 would be ok.
This has been discussed before in this list:
In the days Unicode would fit into 16 bits, somebody made a 16-bit patch
of
Flex. Perhaps the following address till works:
Unicode Flex: ftp://ftp.lauton.com/pub/flex-2.5.4-unicode-patch.tar.gz
But this is not UTF-16. And the 16-bit tables becomes very large, even
impossibly large if one should admit the Unicode code points range
U+0000...10FFFF. Therefore, I suggested a UTF-8, implementation, recall,
as
UTF-16 does not seem to gain anything over UTF-8. But here has been no
progress report on this list since that occasion.
Another alternative would be to fit Unicode into at least 24-bit (in effect
32-bit) words, but the one has to get into the question of table
compression techniques.