bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #63076] [tmac] add Russian language support


From: G. Branden Robinson
Subject: [bug #63076] [tmac] add Russian language support
Date: Sun, 18 Sep 2022 12:37:41 -0400 (EDT)

Update of bug #63076 (project groff):

                Category:                 General => Macro - others/general 
                 Summary: Adding Russian language to groff => [tmac] add
Russian language support

    _______________________________________________________

Follow-up Comment #3:

[comment #0 original submission:]
> I would love to help the project by adding support for Russian language (in
a similar manner to Italian, German and so on, by providing hyphenation rules
and strings in Russian for macro packages). The Russian language is spoken by
around 258 million people in the world, so I think this addition may
potentially help a lot of new and old groff users. What are your thoughts?

I see nothing objectionable here.

[comment #1 comment #1:]
> When I wrote it, I forgot that the built-in fonts do not support Cyrillic
symbols and you always has to use a 3rd-party font to write in Russian.

Yes, but that's true of Chinese and Japanese as well, yet we have the
localization macro files _zh.tmac_ and _ja.tmac_.

> So now I'm not sure if adding strings and hyphenation rules makes sense...
Unless I also add Cyrillic symbols to the built-in fonts.

I don't think that is necessary.  I don't know if you saw my lengthy feedback
to a recent request to support UTF-16-encoded fonts in grops, but some similar
considerations are present.  People sometimes think they have to bite off more
of a task than they really do.

Specifically, for Russian language support, the main things to check are the
ones you need to customize: correctly localized strings, and correct
hyphenation.

The really good news is that both are straightforward to verify for a fluent
Russian speaker even without Cyrillic fonts installed.  All that is required
is a partial understanding of GNU _troff_'s device-independent output format
("grout", I like to call it) and either the ability to recognize Unicode code
points for Cyrillic letters or a crude filter that will translate them for
spot-checking at a terminal.

Consider the "simple example" from this message to the _groff_ mailing list
about a month ago
<https://lists.gnu.org/archive/html/groff/2022-08/msg00045.html>.

If you can follow it, you are well on your way to knowing all the "grout" you
need to to verify Russian language support.

For example, with a reasonable _ru.tmac_ file in place, I expect to be able to
produce output much like this using "groff -kZ -Tuf8".


x T utf8
x res 240 24 40
x init
x F -
p1
x font 1 R
f1
s10
V40
H0
md
DFd
Cu0431
H24
Cu043B
h24
Cu0430
h24
Cu0433
h24
Cu043E
h24
Chy
h24
n40 0
V80
H0
Cu0434
H24
Cu0430
h24
Cu0440
h24
Cu044F
h24
n40 0
x trailer
V2640
x stop


(If you're wondering how I got that, I gave _groff_ input consisting of a
Cyrillic word familiar to me, turned off adjustment, set the line length to
8n, and manually stuck in a hyphen where it seemed to make sense to me as an
English speaker, knowing very little of the morphology of the Slavic-language
specimen.)

(The above output can be given as input to _grotty_ on a UTF-8-capable
terminal and a terminal font with Cyrillic coverage, which is probably the
practical means of checking Russian language support, but I wanted to
illustrate how the task could be achieved even in a plain ASCII environment.)

Validating the localized strings is even easier; they simply need to be input
in the source files in a natural way (UTF-8 and all) and proofread for
spelling errors.  I cannot imagine a GNU _troff_ failure mode where these
input characters will be remapped or resequenced.

The "Manipulating Hyphenation" section/node of the _groff_ Texinfo manual
should prove useful for constructing the list of hyphenation codes.  There is
an example in German already which illustrates the basic principle; each
lowercase Cyrillic letter will need a "mapping" to itself, and then each
uppercase counterpart can be mapped to the lowercase one.

The adaptation of TeX hyphenation patterns is the part that seems like it
might be hardest to me.  All of our current hyphenation pattern files are in
ASCII or an ISO-8859 encoding.  I don't know if _iconv_(1) is up to
transforming them to a form that GNU _troff_'s hyphenation pattern file reader
is prepared to accept.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/env.cpp#n3784

Making _groff_ support KOI8-R might be easier than getting it to interpret
UTF-8-encoded hyphenation patterns, funny as that may sound.

On the other hand, we can integrate the macro file part of this first because
it is independent and easy (saith I).  Without hyphenation pattern files,
hyphenation simply won't be done.  That won't get us professional-grade
typography, but it will improve significantly on the status quo.

(Mainly as a note to myself, one thing to check is what data type is used for
hyphenation codes.  If it's a char, we should fix that.)


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63076>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]