[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Flex and 32-bits characters
From: |
Hans Aberg |
Subject: |
Re: Flex and 32-bits characters |
Date: |
Mon, 26 Aug 2002 19:08:08 +0200 |
At 11:46 -0400 2002/08/26, Antoine Fink wrote:
>As I said (see below), the part of converting from and to different
>encodings( ascii, utf-8, ucs-4...) is already resolved. I only need Flex
>to read in 32-bits chars, and I will write the lexing rules to interpret
>it. It's more a memory/data structure issue than an interpreting/encoding
>problem. As long as Flex can read 32-bits (wether its ucs-4 or not), I'll
>be happy :)
So that is also what I think: Flex will need a 32-bit (or rather 21 bit
plus favorite padding) option.
>> I think what you mention is the major candidate for implementing Unicode
>> onto Flex: Hook up code converters (like C++ std::codecvt), so that
>> internally Flex only sees say UTF-32. This is the only way to handle the
>> many different possible encodings, plus the problem of variable width
>> characters.
>
>Hm.. I'm sorry I must've forgot to specify this : the solution I seek must
>use C langage and not C++... I agree that codecvt would be helpful here,
>but again, in my current situation, the conversion is already done (back
>and forth).
I mention C++, because Flex will have to support C++. Then it may be too
much work seeking specialty C implementations. The context was really
variable width characters, not fixed width 32-bit characters.
>> One interesting alternative might be to make Flex produce a very compact
>> NFA machine table, which is converted to DFA states and cached as needed.
>I know about the character-type indexed tables. There can be more than one
>possible work-around (hash tables, cached compressed tables, DFA's..) but
>in fact, the goal behing all of this 32-bits-in-flex thing is to convert a
>regular expression to a DFA. We are quite experienced with DFA's and
>transducers so this could be a little easier for us, but then again, we
>have to look closely at the problem (if this is the way we want to go,
>that is, digging into Flex' own source code..)
Right. The argument was presented as a counterweight to the argument that
one should use UTF-6 only with static 2^16 tables. Then the hardware
architecture with CPU's running much faster than the memory access clock
may make smaller , compressed tables which are expend at need faster than
the traditional static table, the latter which requires more memory access.
>> There is a "Unicode Flex" on the Internet:
>> Unicode Flex: ftp://ftp.lauton.com/pub/flex-2.5.4-unicode-patch.tar.gz
>> In reality, I think though that it only changes char to wchar_t, and assume
>> that that latter type is 16 bit.
>> If you want to experiment with current Flex, it is at:
>> Flex Beta (2.5.15): ftp://ftp.uncg.edu/people/wlestes/
>
>Yep, this patch helps only for wchar_t (16 bits) characters. I might
>consider doing such a patch (using the same approach) but for 32-bits
>chars...
But you are not likely to be able to implement a 2^32 table. So then you
must install some table cutoff code. And then you are already on the table
compression path.
Hans Aberg
- RE: Flex and 32-bits characters, (continued)
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/24
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/24
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/26
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/26
- RE: Flex and 32-bits characters, Hans-Bernhard Broeker, 2002/08/26
- RE: Flex and 32-bits characters, Mark Weaver, 2002/08/26
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/26
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/26
- Message not available
- RE: Flex and 32-bits characters, Hans Aberg, 2002/08/26
Re: Flex and 32-bits characters, Antoine Fink, 2002/08/26
- Re: Flex and 32-bits characters,
Hans Aberg <=
RE: Flex and 32-bits characters, Mark Weaver, 2002/08/26
Re: Flex and 32-bits characters, Mark Weaver, 2002/08/26
Flex and 32-bits characters, Antoine Fink, 2002/08/26