groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PDF outline not capturing Cyrillic text


From: Robin Haberkorn
Subject: Re: PDF outline not capturing Cyrillic text
Date: Fri, 23 Jun 2023 21:17:58 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0

Hello Peter,

I am also now stumbling across Cyrillc-related issues with pdfmark. I am using
ms for the time being. The bug also affects autogenerating link texts given via
`.pdfhref L`.
In the most simple case, preconv will turn your Cyrillic characters into escapes
which are apparently not further interpreted by pdfmark (or anything that 
follows).
I see text like "[u0421][u043F]..." in my outline.

I believe that this is why you have .pdfmomclean in MOM. Do I understand
correctly that this is supposed to turn the escapes back into Latin-1?
This is presumably mainly the work of .asciify, which would be misnamed anyway.
It does not work with Cyrillic at all, which doesn't surprise.
That's also why you don't get "mojibake garbage" in the outline. None of the
Cyrillic characters end up in intermediate output.

It also explains why I previously had no problems with German Unicode characters
(that was using MOM) - they can be converted back into Latin-1.

Manually editing the ps:exec lines in the intermediate output and inserting
Unicode characters there, does not produce the desired results, which is also
not surprising.

So it seems that the main problem really lies in grops and/or gropdf which
should ideally work with the Unicode escapes produced by preconv.
I am not sure if we would still need .pdfmomclean. But whatever useful stuff it
currently does, it should probably be in pdfmark.tmac (and/or pdf.tmac?) 
instead.

Best regards,
Robin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]