help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Flex and 32-bits characters


From: Hans-Bernhard Broeker
Subject: RE: Flex and 32-bits characters
Date: Mon, 26 Aug 2002 11:04:13 +0200 (MET DST)

On Mon, 26 Aug 2002, Mark Weaver wrote:

> So now maybe I'm matching names with an expression like:
> 
> [A-Za-z]+[A-Za-z0-9_]*
> 
> How do I say that for a unicode flex?  

Seems to me the obvious answer would be:

  [:alpha:]([:alnum:]|_)*

Flex does have character classes, and Unicode is definitely one case where
they should take precedence over old-fashioned [a-z] & friends.

> Oh nearly forgot.  Yes, keeping track of UTF-16 strings is a little bit of a
> pain, but it's not too vast.  There are macros provided that will help you
> iterate through a string.  

Iteration isn't enough.  Not by a wide margin, I think.  The real problem
is free-flow navigation.  Flex expects strings to be freely accesible
arrays. It expects to be able to index state arrays by input character, so
I don't see how it will ever be able to work with a variable-length
representation.

> And I would simply pass yyleng as the correct character count, and
> yytext as the UTF-16 string.  The user can take it from there.

Not efficiently.  Let's say the user needs a copy of the current yytext
for later reference.  yyleng alone doesn't tell him how much memory to
allocate. So he'll have to run (a UTF-16 version of) strlen() over the
result, just as if yyleng hadn't been available in the first place.  
Either that, or waste memory.  AFAICS, both length (=number of characters)
and size (=number of bytes) of the current yytext would have to be
exported by flex.  And having the two of them would probably confuse
beginning users endlessly.

-- 
Hans-Bernhard Broeker (address@hidden)
Even if all the snow were burnt, ashes would remain.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]