[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] unicode support - questions
From: |
Bruno Haible |
Subject: |
[Groff] unicode support - questions |
Date: |
Mon, 23 Jan 2006 16:53:04 +0100 |
User-agent: |
KMail/1.5 |
Hi,
So far, I have a first draft of a patch that makes groff work with Unicode
fonts without having to first register thousands of characters. Before
submitting the patch slice after slice, may I have your opinion about four
questions?
1) In nametoindex.cpp and troff/charinfo.h, the term "ascii_char" and
"ascii_code" is used for unibyte characters in the input encoding.
As far as I understand,
- values >= 128 are possible and valid,
- when the "latin1" device or "cp1047" device or "latin2" device
(found in some Linux distributions) is used, values >= 128
denote characters of this encoding.
So I would like to rename these to "single_char" and "single_char_code"
respecively. Is that OK? Do you find "unibyte" a better term?
2) When CP1047 is used, and commands like .trin \[char72]\[,c] are active,
does the font::name_to_index API see the character name before or
after the translation? I.e. does it see "char72" or ",c"?
3) My current patch creates two subclasses 'enumerated_font' and
'unicode_font' of 'class font'.
An enumerated font has all its characters enumerated in the font file.
A unicode font covers all combined Unicode characters (consisting of a
base character and zero or more combining characters).
The subclasses in the HTML and TTY backends inherit from 'unicode_font',
whereas the others inherit from 'enumerated_font'.
Is it imaginable that a driver/backend might want to use both kinds
of font? In that case I would merge back both classes into 'class font',
and use a boolean is_unicode flag to distinguish the cases. The code
becomes less pretty this way but it would avoid a possible problem in
some future drivers/backends.
4) Currently the API of nametoindex.cpp has a different implementation
at the end of troff/input.cpp. My current patch needs to go back from
the index to the character name, and so an additional inverse table
mapping index -> character name needs to be introduced. This takes up
memory and causes extra memory references. I would be inclined to
replace this "int index" with a pointer to an abstract class, say
abstract_char, of which the 'class charinfo' (on the troff side) and
'class backend_char' (for the backends) would be subclasses. This
would not only consume less memory but also make the code more robust
(as it is easier to misuse an 'int' accidentally). What do you think
about this?
Bruno
- [Groff] unicode support - questions,
Bruno Haible <=