Why does Groff decompose Unicode glyphs in intermediate output?

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why does Groff decompose Unicode glyphs in intermediate output?

From:	Robin Haberkorn
Subject:	Why does Groff decompose Unicode glyphs in intermediate output?
Date:	Sun, 10 Nov 2024 06:13:32 +0300 (MSK)
User-agent:	Alpine 2.26 (BSF 649 2022-06-02)

Dear groffers,

can anybody explain why in Groff 1.23.0:

# echo -n 'й' | preconv -eutf-8
.lf 1 -
\[u0439]

But:

# echo -n 'й' | preconv -eutf-8 | groff -wall -Z -Tutf8
x T utf8
x res 240 24 40
x init
x F -
p1
x font 1 R
f1
s10
V40
H0
md
DFd
Cu0438_0306
H24
n40 0
x trailer
V2640
x stop

In other words, while preconv gave the expected U+0439, Groff transformsthis into a combining character. This is then converted back into U+0439by grotty:


# echo -n 'й' | preconv -eutf-8 | groff -wall -Z -Tutf8 | grotty | hexdump -C
00000000  d0 b9 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a   |................|
00000010  0a 0a 0a 0a 0a 0a 0a 0a  0a 0a 0a 0a 0a 0a 0a 0a   |................|
*
00000040  0a 0a 0a 0a                                        |....|
00000044

I am writing my own Groff postprocessor [1] and this gives me headaches.Is there any algorithm to convert the combining characters back to singlecodepoints or am I supposed to use large translation tables for that?Somehow grotty is obviously doing it, but I haven't yet read the sourcecode.There appears to be a Unicode composition algorithm in iconv(). glib wrapsthis to g_unichar_compose().It appears, I would have to wrap this in my programming language (SciTECO)as well, if I'd like to support all of the glyphs with diacritics it in mypostprocessor.

IMHO groff shouldn't decompose characters that haven't been decomposed inits input.


Best regards,
Robin

[1]: https://github.com/rhaberkorn/sciteco/blob/master/doc/grosciteco.tes

[Prev in Thread]

Current Thread

[Next in Thread]

Why does Groff decompose Unicode glyphs in intermediate output?, Robin Haberkorn <=
- Re: Why does Groff decompose Unicode glyphs in intermediate output?, G. Branden Robinson, 2024/11/10
  - Re: Why does Groff decompose Unicode glyphs in intermediate output?, G. Branden Robinson, 2024/11/10
  - Re: Why does Groff decompose Unicode glyphs in intermediate output?, Robin Haberkorn, 2024/11/10

Prev by Date: Re: A list of chemical names
Next by Date: Re: A list of chemical names
Previous by thread: A list of chemical names
Next by thread: Re: Why does Groff decompose Unicode glyphs in intermediate output?
Index(es):
- Date
- Thread