bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #63074] [troff] need a way to embed non-Basic Latin glyphs in devic


From: Deri James
Subject: [bug #63074] [troff] need a way to embed non-Basic Latin glyphs in device control commands
Date: Tue, 27 Sep 2022 16:47:50 -0400 (EDT)

Follow-up Comment #11, bug #63074 (project groff):

The messages which started this bug: "special characters are not defined",
have very little to do with the message you recently suppressed. This message
is because groff starts up with the TR font, which has no Cyrillic glyphs, and
if the macro package includes a .if 'text'text' statement before the font is
switched to U-TR, and compares the output of the two text portions, if either
"text" includes glyphs not present in the font, you will receive this
warning.

You can prove this with:-


[derij@pip bug-63074]$ printf ".if \'\\[u041D]\'\\[u041D]\' .nop" | groff
-Tpdf -z
troff: <standard input>:1: warning: can't find special character 'u041D'


But if you add -f U-T to the groff command there is no error.


[derij@pip bug-63074]$ printf ".if \'\\[u041D]\'\\[u041D]\' .nop" | groff
-Tpdf -z -f U-T


The mom macro set has an .if statement in the .TITLE macro which is called
before the .FAMILY takes effect.

Now to deal with why Cyrillic glyphs do not appear in the bookmark panel, but
do appear in the text of the document. The text is using the embedded fonts
which contain the Cyrillic glyphs mapped to appropriate code points. The
bookmark panel is using whatever system font you have configured for window
text. The system font will have Cyrillic glyphs but they will be using UTF
code points, not the 8-bit codes available to a type 1 postscript font.

I suspect, although I have not attempted, that if you set your system to
legacy Russian rather than the UTF variant it would be possible to get
cyrillics into the bookmark panel.

The pdf standard allows two encodings for strings in pdfs. We are using
PDFDocEncoding which is a superset of ISO Latin 1 and does not include
cyrillics. The alternative is UTF-16 (UTF-8 is not supported), the string must
start with a BOM character, and this would allow any UTF glyph to appear in
bookmarks. The reason I used the 8-bit encoding is because the groff .asciify
command converts the \[UXXXX] back to ascii for me and as a bonus dropped all
other escapes from the string which could not be represented as ascii. So a
string such as "\fB\s'+2p'foo\s'-2p'\fP" would be converted to "foo". The only
niggle was the warning message (now suppressed) each time it dropped a node
such as "\fB".

If I dropped the .asciify from pdf.tmac it would mean all the \[uXXXX] strings
would reach the post processor gropdf, which could then assemble a UTF-16
string from the hex numbers. As a proof of concept I made some changes to
pdf.tmac and gropdf and pdfmom -k -f U-T mom-ru.mom produced the attached pdf.
Still a fair bit to do, the biggest job is to sanitise the string to remove
unwanted escapes, convert any glyph producing escapes such as \C and \N back
to a UTF-16 character, and convert basic latin characters to UTF-16. I suspect
a deep dive into the asciify routine in groff will be helpful.


(file #53761)

    _______________________________________________________

Additional Item Attachment:

File name: mom-ru.pdf                     Size:211 KB
    <https://file.savannah.gnu.org/file/mom-ru.pdf?file_id=53761>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63074>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]