[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] Re: unicode support, part 14: unicode fonts
From: |
Werner LEMBERG |
Subject: |
Re: [Groff] Re: unicode support, part 14: unicode fonts |
Date: |
Thu, 10 Aug 2006 08:14:14 +0200 (CEST) |
> For Unicode fonts (which ought to be increasingly the norm), the
> proposal to write out all glyph properties in the font file seems
> odd; as far as I understand the point of Bruno's Unicode fonts
> versus enumerated fonts is to avoid the need to write out properties
> in font files which are really properties of the Unicode code
> points. Can these properties not be autogenerated from
> UnicodeData.txt (and others, e.g. EastAsianWidth.txt) and used
> automatically for all Unicode fonts? Glyph classes would then be
> useful for efficient internal storage, but there would be no urgent
> need to represent them in the font files.
Please bear in mind that groff, similar to TeX, don't store character
information; everything is related to glyphs -- I won't accept a
solution which works for a particular device only. For example, take
a Japanese PS font; you can't safely assume that the font's
`full-width' characters are full-width at all because this gives poor
typographical output. We *need* glyph classes. Of course,
EastAsianWidth.txt and other Unicode data files can be used to
autogenerate the font description files for devutf8 and devhtml, but I
don't want to store the data hardcoded in troff.
> It feels like groff is quite close to being able to render CJK
> reasonably well - the major omissions seem to be width handling and
> kinsoku shori (is that an accurate assessment?)
This is correct.
> (In addition, the Debian patches also create an "ascii8" device,
> which is a curious little hack that effectively passes through
> characters encoded according to the current locale - so if the input
> to ascii8 is ISO-8859-2, then you get ISO-8859-2 output. At
> present, man uses this device for Czech, Croatian, Hungarian,
> Polish, Russian, Slovak, and Turkish. Obviously this device is
> typographically dubious at best, so I'll replace it by use of
> preconv/soelim/whatever and an iconv postprocessing step;
> latin2.tmac and latin5.tmac would work as well but those appear to
> be largely superseded by preconv.)
latin2.tmac and friends are *not* superseded, you need them for proper
hyphenation. Have a look at my recent answer to a mail called `koi8-r
hyphenation revisited'.
Werner