[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-8/Unicode Bison
From: |
Hans Aberg |
Subject: |
UTF-8/Unicode Bison |
Date: |
Sun, 09 Jan 2005 14:40:41 +0100 |
User-agent: |
Microsoft-Outlook-Express-Macintosh-Edition/5.0.6 |
There seems to be a simple way to extend Bison to Unicode. Essentially, this
embarks to give meaning to the '...' construct for Unicode characters. One
way is to treat this as a UTF-8 multibyte sequence. Bison would thus treat
this as a sequence of character tokens. Now, if the .y grammar file is
assumed to be in UTF-8, then what is needed is to give 'c1 ... ck' meaning
for a suitable character sequence, by merely translating it into the
character token sequence 'c1'...'ck'.
As for the yylex handshaking, I see two possibilities: A UTF-8 mode, where a
multibyte sequence is returned one by one, in a succession of yylex calls.
An a Unicode mode, where yylex returns the full Unicode number in UTF-32.
Bison would then start its token number at number higher than 0x10FFFF, the
highest possible Unicode number. If a Unicode number is returned by yylex,
then the Bison parser translates this into a UTF-8 sequence, which is the
processed as normal.
Hans Aberg
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- UTF-8/Unicode Bison,
Hans Aberg <=