[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PDF outline not capturing Cyrillic text
From: |
Deri |
Subject: |
Re: PDF outline not capturing Cyrillic text |
Date: |
Fri, 23 Jun 2023 22:40:42 +0100 |
On Friday, 23 June 2023 19:17:58 BST Robin Haberkorn wrote:
> Hello Peter,
>
> I am also now stumbling across Cyrillc-related issues with pdfmark. I am
> using ms for the time being. The bug also affects autogenerating link texts
> given via `.pdfhref L`.
> In the most simple case, preconv will turn your Cyrillic characters into
> escapes which are apparently not further interpreted by pdfmark (or
> anything that follows). I see text like "[u0421][u043F]..." in my outline.
>
> I believe that this is why you have .pdfmomclean in MOM. Do I understand
> correctly that this is supposed to turn the escapes back into Latin-1?
> This is presumably mainly the work of .asciify, which would be misnamed
> anyway. It does not work with Cyrillic at all, which doesn't surprise.
> That's also why you don't get "mojibake garbage" in the outline. None of the
> Cyrillic characters end up in intermediate output.
>
> It also explains why I previously had no problems with German Unicode
> characters (that was using MOM) - they can be converted back into Latin-1.
>
> Manually editing the ps:exec lines in the intermediate output and inserting
> Unicode characters there, does not produce the desired results, which is
> also not surprising.
>
> So it seems that the main problem really lies in grops and/or gropdf which
> should ideally work with the Unicode escapes produced by preconv.
> I am not sure if we would still need .pdfmomclean. But whatever useful stuff
> it currently does, it should probably be in pdfmark.tmac (and/or pdf.tmac?)
> instead.
>
> Best regards,
> Robin
Hi Robin,
The features you require are coming. This is an example of Russian with
bookmarks in cyrillic. I'm afraid I don't know what it means and I have
forgotten where I got the text.
Cheers
Deri
Rus2.pdf
Description: Adobe PDF document