help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flex and 32-bits characters


From: Hans Aberg
Subject: Re: Flex and 32-bits characters
Date: Mon, 26 Aug 2002 19:08:08 +0200

At 11:46 -0400 2002/08/26, Antoine Fink wrote:
>As I said (see below), the part of converting from and to different
>encodings( ascii, utf-8, ucs-4...) is already resolved. I only need Flex
>to read in 32-bits chars, and I will write the lexing rules to interpret
>it. It's more a memory/data structure issue than an interpreting/encoding
>problem. As long as Flex can read 32-bits (wether its ucs-4 or not), I'll
>be happy :)

So that is also what I think: Flex will need a 32-bit (or rather 21 bit
plus favorite padding) option.

>> I think what you mention is the major candidate for implementing Unicode
>> onto Flex: Hook up code converters (like C++ std::codecvt), so that
>> internally Flex only sees say UTF-32. This is the only way to handle the
>> many different possible encodings, plus the problem of variable width
>> characters.
>
>Hm.. I'm sorry I must've forgot to specify this : the solution I seek must
>use C langage and not C++... I agree that codecvt would be helpful here,
>but again, in my current situation, the conversion is already done (back
>and forth).

I mention C++, because Flex will have to support C++. Then it may be too
much work seeking specialty C implementations. The context was really
variable width characters, not fixed width 32-bit characters.

>> One interesting alternative might be to make Flex produce a very compact
>> NFA machine table, which is converted to DFA states and cached as needed.

>I know about the character-type indexed tables. There can be more than one
>possible work-around (hash tables, cached compressed tables, DFA's..) but
>in fact, the goal behing all of this 32-bits-in-flex thing is to convert a
>regular expression to a DFA. We are quite experienced with DFA's and
>transducers so this could be a little easier for us, but then again, we
>have to look closely at the problem (if this is the way we want to go,
>that is, digging into Flex' own source code..)

Right. The argument was presented as a counterweight to the argument that
one should use UTF-6 only with static 2^16 tables. Then the hardware
architecture with CPU's running much faster than the memory access clock
may make smaller , compressed tables which are expend at need faster than
the traditional static table, the latter which requires more memory access.

>> There is a "Unicode Flex" on the Internet:
>>     Unicode Flex: ftp://ftp.lauton.com/pub/flex-2.5.4-unicode-patch.tar.gz
>> In reality, I think though that it only changes char to wchar_t, and assume
>> that that latter type is 16 bit.
>> If you want to experiment with current Flex, it is at:
>>     Flex Beta (2.5.15): ftp://ftp.uncg.edu/people/wlestes/
>
>Yep, this patch helps only for wchar_t (16 bits) characters. I might
>consider doing such a patch (using the same approach) but for 32-bits
>chars...

But you are not likely to be able to implement a 2^32 table. So then you
must install some table cutoff code. And then you are already on the table
compression path.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]