help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fw: Does flex support UTF-8


From: Chuck Carmack
Subject: Fw: Does flex support UTF-8
Date: Wed, 31 Mar 2004 07:37:53 -0600


I'm copying in Martin's note to me that I referred to...sorry for not sending it to the list.

I see.  What I did was simply write patterns that match single UTF-8 characters.  They look something like the following, but you might want to re-read the UTF-8 specification because I wrote this so long ago.  You will of course need to combine these since you want unicode strings, not just single characters.
 
UB     [\200-\277]
%%

[\300-\337]{UB}             { do something }
[\340-\357]{UB}{2}          { do something }
[\360-\367]{UB}{3}          { do something }
[\370-\373]{UB}{4}          { do something }
[\374-\375]{UB}{5}          { do something }

 

 - - Martin

----- Forwarded by Chuck Carmack/Rochester/IBM on 03/31/2004 07:37 AM -----

Hans Aberg <address@hidden>

03/30/2004 05:22 PM

To
Chuck Carmack/Rochester/address@hidden
cc
address@hidden
Subject
Re: Fw: Does flex support UTF-8





At 11:06 -0600 2004/03/30, Chuck Carmack wrote:
>I think we have a plan using utf-8 in our patterns that was suggested by
>Martin.  All we need to do is allow chars > 0x7f in an "identifier" (for
>example a,b,c, d below).  We have a utf-8 validator and a converter to
>utf-16 that we can have flex call (or have bison call... I'm a newbie on
>this).  We just need the raw bytes from flex for an identifier token.

I did not see this suggestion -- perhaps you forgot cc'ing the Flex list.

But in your tweaking, if you find a good way to implement UTF-8, please
report it back here, because I feel sure the Flex developers will be
interested. (I am not a Flex developer myself.)

But when working with Unicode, I figure that the idea must be to let Flex
be able to handle one Unicode encoding internally, gulping up raw bytes or
words. Then, if one needs another encoding in input, it is probably best to
hook up an external translator. So the trick will be to find which Unicode
encoding that works best with Flex. I think the most promising candidates
are UTF-8 and UTF24.

Also note that you should avoid styled text in the GNU mailing lists, as
not all readers may have it.

 Hans Aberg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]