[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Flex and 32-bits characters
From: |
Mark Weaver |
Subject: |
RE: Flex and 32-bits characters |
Date: |
Mon, 26 Aug 2002 13:16:02 +0100 |
> > How do I say that for a unicode flex?
>
> Seems to me the obvious answer would be:
>
> [:alpha:]([:alnum:]|_)*
Err yes. That is obvious enough, isn't it ;)
> Iteration isn't enough. Not by a wide margin, I think. The real problem
> is free-flow navigation. Flex expects strings to be freely accesible
> arrays. It expects to be able to index state arrays by input character, so
> I don't see how it will ever be able to work with a variable-length
> representation.
This is my complete lack of knowledge of the flex internals, could you fill
me in?
Here we have a slightly greater problem anyway, which is not solved by
UTF-32.
Consider a g with an acute accent. Now to a user, they might be trying to
match a word containing this character. Sounds reasonable. However this in
Unicode is a combining character sequence, which is not represented by a
single code point. Even in UTF-32. So UTF-32 doesn't fix the fundamental
problem.
Take a look at:
http://www.unicode.org/unicode/reports/tr18/
which is basically the result of someone else having worked all this out for
us ;) Perhaps Mr Davis would like to help us out!
>
> > And I would simply pass yyleng as the correct character count, and
> > yytext as the UTF-16 string. The user can take it from there.
>
> Not efficiently. Let's say the user needs a copy of the current yytext
> for later reference. yyleng alone doesn't tell him how much memory to
> allocate. So he'll have to run (a UTF-16 version of) strlen() over the
> result, just as if yyleng hadn't been available in the first place.
> Either that, or waste memory. AFAICS, both length (=number of characters)
> and size (=number of bytes) of the current yytext would have to be
> exported by flex. And having the two of them would probably confuse
> beginning users endlessly.
Well pointed out. Beginners seem to get endlessly confused by flex anyway
;)
>
> --
> Hans-Bernhard Broeker (address@hidden)
> Even if all the snow were burnt, ashes would remain.
>
>
>
> _______________________________________________
> Help-flex mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/help-flex
>
- Flex and 32-bits characters, Antoine Fink, 2002/08/23
- Re: Flex and 32-bits characters, Hans Aberg, 2002/08/24
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/24
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/24
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/26
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/26
- RE: Flex and 32-bits characters, Hans-Bernhard Broeker, 2002/08/26
- RE: Flex and 32-bits characters,
Mark Weaver <=
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/26
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/26
- Message not available
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/26
Re: Flex and 32-bits characters, Antoine Fink, 2002/08/26