[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] Re: Unicode, EBCDIC, Latin-2, JIS for groff
From: |
Eric Fischer |
Subject: |
[Groff] Re: Unicode, EBCDIC, Latin-2, JIS for groff |
Date: |
Fri, 10 Mar 2000 14:40:58 -0600 (CST) |
> Question: How far is the project of Unicode input?
Here's what I've done so far:
* The file iterator recognizes valid UTF-8 patterns in the input,
and when they are encountered they get transmuted into \U'number'.
Latin-1 characters (which is to say, eight-bit characters that are
not part of a legal UTF-8 sequence) are also temporarily translated
into \U sequences; ASCII characters are passed through unchanged.
Accepting Latin-2 or whatever based on a command line option would
be easy to add; accepting EBCDIC would also be easy if everyone could
agree on what EBCDIC characters should map to what Unicode characters.
* The tokenization routine recognizes \U and converts anything outside
the range 0x00 to 0xFF into \[char0xNNNN] or \[char0xNNNNNNNN] as
appropriate.
This makes non-Latin1 characters second-class citizens (they can't be
used in the names of macros, etc.), but I was intimidated by the task
of finding every place in the program that depends on characters being
at most eight bits wide.
* An extension to the ligature mechanism joins Unicode combing accents
to their base characters as a single character whenever possible.
* I've been working on more general support for accents (for the cases
where there isn't a single Unicode character that represents the
accented letter or where a character has multiple accents) but this
doesn't work very well yet.
* I haven't done anything with right-to-left or reordered characters.
As I understand it, Plan 9 troff doesn't support these (or combining
accents) either.
> Additionally, I suggest to use UTF8 exclusively as the external
> encoding representation if, say, the command line option `-u' is used.
If you want the *output* to be UTF-8 as well as the input, this is also
going to require changes to all the postprocessors. It is what Plan 9
troff does, though.
eric