help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fw: Does flex support UTF-8


From: Chuck Carmack
Subject: Fw: Does flex support UTF-8
Date: Tue, 30 Mar 2004 11:06:04 -0600


Thanks Hans...

I think we have a plan using utf-8 in our patterns that was suggested by Martin.  All we need to do is allow chars > 0x7f in an "identifier" (for example a,b,c, d below).  We have a utf-8 validator and a converter to utf-16 that we can have flex call (or have bison call... I'm a newbie on this).  We just need the raw bytes from flex for an identifier token.

Chuck

----- Forwarded by Chuck Carmack/Rochester/IBM on 03/30/2004 10:58 AM -----
Hans Aberg <address@hidden>

03/30/2004 10:38 AM

To
Chuck Carmack/Rochester/address@hidden
cc
address@hidden
Subject
Re: Fw: Does flex support UTF-8





At 16:48 -0600 2004/03/29, Chuck Carmack wrote:
>
>Hi Martin:
>
>We'd like to create a parser to parse SQL-like statements.  For example,
>
>SELECT a FROM b WHERE c = d
>
>where a,b,c, and d can be in unicode.  Our idea was to allow utf-8 for
>those, but ucs-2 or utf-16 would be ok.

This has been discussed before in this list:

In the days Unicode would fit into 16 bits, somebody made a 16-bit patch of
Flex. Perhaps the following address till works:
 Unicode Flex: ftp://ftp.lauton.com/pub/flex-2.5.4-unicode-patch.tar.gz

But this is not UTF-16. And the 16-bit tables becomes very large, even
impossibly large if one should admit the Unicode code points range
U+0000...10FFFF. Therefore, I suggested a UTF-8, implementation, recall, as
UTF-16 does not seem to gain anything over UTF-8. But here has been no
progress report on this list since that occasion.

Another alternative would be to fit Unicode into at least 24-bit (in effect
32-bit) words, but the one has to get into the question of table
compression techniques.

 Hans Aberg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]