help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Flex and 32-bits characters


From: Mark Weaver
Subject: RE: Flex and 32-bits characters
Date: Mon, 26 Aug 2002 13:16:02 +0100

> > How do I say that for a unicode flex?
>
> Seems to me the obvious answer would be:
>
>   [:alpha:]([:alnum:]|_)*

Err yes.  That is obvious enough, isn't it ;)

> Iteration isn't enough.  Not by a wide margin, I think.  The real problem
> is free-flow navigation.  Flex expects strings to be freely accesible
> arrays. It expects to be able to index state arrays by input character, so
> I don't see how it will ever be able to work with a variable-length
> representation.

This is my complete lack of knowledge of the flex internals, could you fill
me in?

Here we have a slightly greater problem anyway, which is not solved by
UTF-32.
Consider a g with an acute accent.  Now to a user, they might be trying to
match a word containing this character.  Sounds reasonable.  However this in
Unicode is a combining character sequence, which is not represented by a
single code point.  Even in UTF-32.  So UTF-32 doesn't fix the fundamental
problem.

Take a look at:

http://www.unicode.org/unicode/reports/tr18/

which is basically the result of someone else having worked all this out for
us ;)  Perhaps Mr Davis would like to help us out!

>
> > And I would simply pass yyleng as the correct character count, and
> > yytext as the UTF-16 string.  The user can take it from there.
>
> Not efficiently.  Let's say the user needs a copy of the current yytext
> for later reference.  yyleng alone doesn't tell him how much memory to
> allocate. So he'll have to run (a UTF-16 version of) strlen() over the
> result, just as if yyleng hadn't been available in the first place.
> Either that, or waste memory.  AFAICS, both length (=number of characters)
> and size (=number of bytes) of the current yytext would have to be
> exported by flex.  And having the two of them would probably confuse
> beginning users endlessly.

Well pointed out.  Beginners seem to get endlessly confused by flex anyway
;)

>
> --
> Hans-Bernhard Broeker (address@hidden)
> Even if all the snow were burnt, ashes would remain.
>
>
>
> _______________________________________________
> Help-flex mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/help-flex
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]