[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] ubuntu, groff and utf-8
From: |
Michail Vidiassov |
Subject: |
Re: [Groff] ubuntu, groff and utf-8 |
Date: |
Tue, 8 Mar 2005 11:26:26 +0300 |
Dear Werner,
you wrote:
This is correct, unfortunately. groff doesn't yet support UTF8 input.
You have to convert your file first to something groff can understand.
Below is a small perl script which does that. Note that it doesn't
`fake' glyphs, this is, it doesn't construct, say, `Amacron' from an
`A' and a `macron' glyph. Any volunteer for this?
======================================================================
#! /usr/bin/perl -w
#
# uni2groff.pl
#
# Convert input in UTF8 encoding to something groff 1.19 or greater
# can understand. It simply converts all Unicode values >= U+0080
# to the form \[uXXXX].
#
# Usage:
#
# perl uni2groff.pl < infile > outfile
#
# You need perl 5.6 or greater.
use strict;
binmode(STDIN, ":utf8");
while (<>) {
s/(\P{InBasicLatin})/sprintf("\\[u%04X]", ord($1))/eg;
print;
}
# EOF
It seems there is a problem with this script.
If there is an `Amacron' in the data, the script produces `u0100'.
But glyphs in groff are named in decomposed form,
glyph name for `Amacron' is `u0041_0304'.
You can see this from unicode_decomposed hash in afmtodit
and uniglyph.cpp & glyphuni.cpp .
Thus your script has to be made a bit longer by inclusion of
unicode_decomposed hash ;)
And, may be, it is a good idea to replace (optionally, where possible)
unicode glyph names with
the (approx. two character) groff glyph names, the way it is done in
input.cpp,
using unicode_to_glyph_list and following precedents from latin?.tmac.
The reason is to make the output more portable and human-readable.
Sincerely, Michail
PS. Are you sure that mapping in devutf8 fonts (and other places) `la' and
`ra' to 0x27E8(MATHEMATICAL LEFT ANGLE BRACKET) and 0x27E9 is a good idea?
It do not think many fonts have that Math Symbols, while `la' and `ra' are
often used
in roff files in non-math context
- Re: [Groff] ubuntu, groff and utf-8, Michail Vidiassov, 2005/03/08
- Re: [Groff] ubuntu, groff and utf-8, Werner LEMBERG, 2005/03/11
- Re: [Groff] ubuntu, groff and utf-8, Michail Vidiassov, 2005/03/12
- Re: [Groff] ubuntu, groff and utf-8, Werner LEMBERG, 2005/03/14
- Re: [Groff] ubuntu, groff and utf-8, Alejandro López-Valencia, 2005/03/14
- Re: [Groff] ubuntu, groff and utf-8, Michail Vidiassov, 2005/03/14
- Re: [Groff] ubuntu, groff and utf-8, Werner LEMBERG, 2005/03/16
- Re: [Groff] ubuntu, groff and utf-8, Michail Vidiassov, 2005/03/16
- Re: [Groff] ubuntu, groff and utf-8,
Michail Vidiassov <=
- Re: [Groff] ubuntu, groff and utf-8, Werner LEMBERG, 2005/03/08
- Re: [Groff] ubuntu, groff and utf-8, Michail Vidiassov, 2005/03/08
- Re: [Groff] ubuntu, groff and utf-8, Werner LEMBERG, 2005/03/09
- Re: [Groff] ubuntu, groff and utf-8, Colin Watson, 2005/03/09
- Re: [Groff] ubuntu, groff and utf-8, Werner LEMBERG, 2005/03/09
- Re: [Groff] ubuntu, groff and utf-8, Colin Watson, 2005/03/18
Re: [Groff] ubuntu, groff and utf-8, Andrey Borzenkov, 2005/03/12