[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Bison & Unicode
From: |
Hans Aberg |
Subject: |
RE: Bison & Unicode |
Date: |
Wed, 20 Dec 2000 01:03:55 +0100 |
At 14:13 -0600 0-12-19, Keefe, Dan wrote:
>While I agree that this would be a possible solution, I think that keeping
>a parser (Bison) and a lexer (Flex) separate makes more sense. Currently
>Flex (or lex) is responsible for tokenization and makes these tokens available
>for Bison (or yacc). Since Bison only cares about what type of token it
>gets in >order to determine what rule to follow, there should be no
>dependency between >the current state in the parser and the token input
>stream structure since the >token type can be determined in the lexer
>exclusively. On the other hand, >semantic actions need to avail
>themselves of the text (or converted value of >the text) of the current
>token. In this situation, it must be the case that >the parser has
>knowledge of at least the data type being used by the lexer in >order to
>pass this data to other functional parts of the program. It seems to >me
>that exposing the lexer's data type is the fundamental problem and this
>>could be done by commonly declared type for the lexer and parser.
>Although it >may be nice to also expose the encoding scheme and code page
>to the parser, >this is certainly not necessary since this can be
>encapsulated exterior to the >parser.
>
>I think that embedding lexical analysis capability within a parser would
>reduce the level of abstraction of these separate concerns and cause more
>>problems than it would be worth.
Which lexical analyzer do you intend to you with Unicode? :-)
-- By the way, there is a combined lexical-analyzer/compiler-compiler,
http://www.antlr.org/.
-- But in my post I suggested that Bison is invoked twice, so
lexing/parsing would still be separate (yyparse of one becomes yylex of the
other). Even though I think that if one create a "language hierarchy", with
the sentences of the language below being the tokens of the one above, it
can be reduced to a single grammar, by simply making the sentence-tokens
into variables.
-- I have no idea now do get by doing such a thing; I just want to know if
there is some obvious reason for not trying it, such as efficiency.
Hans Aberg
* Email: Hans Aberg <mailto:address@hidden>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>