[Groff] unicode support

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] unicode support - questions

From:	Bruno Haible
Subject:	[Groff] unicode support - questions
Date:	Mon, 23 Jan 2006 16:53:04 +0100
User-agent:	KMail/1.5

Hi,

So far, I have a first draft of a patch that makes groff work with Unicode
fonts without having to first register thousands of characters. Before
submitting the patch slice after slice, may I have your opinion about four
questions?

  1) In nametoindex.cpp and troff/charinfo.h, the term "ascii_char" and
     "ascii_code" is used for unibyte characters in the input encoding.
     As far as I understand,
       - values >= 128 are possible and valid,
       - when the "latin1" device or "cp1047" device or "latin2" device
         (found in some Linux distributions) is used, values >= 128
         denote characters of this encoding.
     So I would like to rename these to "single_char" and "single_char_code"
     respecively. Is that OK? Do you find "unibyte" a better term?

  2) When CP1047 is used, and commands like .trin \[char72]\[,c] are active,
     does the font::name_to_index API see the character name before or
     after the translation? I.e. does it see "char72" or ",c"?

  3) My current patch creates two subclasses 'enumerated_font' and
     'unicode_font' of 'class font'.

     An enumerated font has all its characters enumerated in the font file.
     A unicode font covers all combined Unicode characters (consisting of a
     base character and zero or more combining characters).

     The subclasses in the HTML and TTY backends inherit from 'unicode_font',
     whereas the others inherit from 'enumerated_font'.

     Is it imaginable that a driver/backend might want to use both kinds
     of font? In that case I would merge back both classes into 'class font',
     and use a boolean is_unicode flag to distinguish the cases. The code
     becomes less pretty this way but it would avoid a possible problem in
     some future drivers/backends.

  4) Currently the API of nametoindex.cpp has a different implementation
     at the end of troff/input.cpp. My current patch needs to go back from
     the index to the character name, and so an additional inverse table
     mapping index -> character name needs to be introduced. This takes up
     memory and causes extra memory references. I would be inclined to
     replace this "int index" with a pointer to an abstract class, say
     abstract_char, of which the 'class charinfo' (on the troff side) and
     'class backend_char' (for the backends) would be subclasses. This
     would not only consume less memory but also make the code more robust
     (as it is easier to misuse an 'int' accidentally). What do you think
     about this?

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

[Groff] unicode support - questions, Bruno Haible <=
- Re: [Groff] unicode support - questions, Werner LEMBERG, 2006/01/24
  - Re: [Groff] unicode support - questions, Bruno Haible, 2006/01/24
    - Re: [Groff] unicode support - questions, Werner LEMBERG, 2006/01/25
    - Re: [Groff] unicode support - questions, Bruno Haible, 2006/01/26
    - Re: [Groff] unicode support - questions, Werner LEMBERG, 2006/01/26

Prev by Date: Re: [Groff] UTF-8 Readiness
Next by Date: [Groff] comments
Previous by thread: [Groff] Re: Man and Groff - Author Feedbacks - UTF-8 Comments
Next by thread: Re: [Groff] unicode support - questions
Index(es):
- Date
- Thread