[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encod
From: |
ropers |
Subject: |
Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings |
Date: |
Fri, 8 Mar 2024 01:14:57 +0000 |
On 07/03/2024, Dave Kemper wrote:
> Hi Ian, thanks for your attention to the groff manual!
Thank you very much, Dave, for your helpful and informative replies. :-)
> On 3/7/24, ropers <ropers@gmail.com> wrote:
>> "latin1" sounds awfully ISO-8859-1ish, and (I fear) not very much like
>> the Latin-1 Supplement Unicode block
>
> Correct. Since there are two different things that include "Latin-1"
> in their name, perhaps this wording could be be more explicit. On the
> other hand, the context is input encodings, and a Unicode block is not
> itself an input encoding.
It might be preferable to demine rather than rely on contextual hints
as to the presence of UXO:
$ diff -u groff.texi.orig groff.texi
--- groff.texi.orig 2024-03-05 18:20:59.940460376 +0000
+++ groff.texi 2024-03-08 00:21:12.782360544 +0000
@@ -5509,9 +5509,10 @@
@cindex ISO @w{8859-1} (@w{Latin-1}), input encoding
@cindex input encoding, @w{Latin-1} (ISO @w{8859-1})
@pindex latin1.tmac
-ISO @w{Latin-1}, an encoding for Western European languages, is the
-default input encoding on non-@acronym{EBCDIC} platforms; the file
-@file{latin1.tmac} is loaded at startup.
+ISO 8859-1, aka @w{Latin-1}, an extended ASCII encoding chiefly for
+Western European languages, is still @code{groff}'s default input encoding on
+non-@acronym{EBCDIC} platforms; the file @file{latin1.tmac} is loaded
+at startup.
@end table
@noindent
@@ -5533,9 +5534,9 @@
@cindex ISO @w{8859-2} (@w{Latin-2}), input encoding
@cindex input encoding, @w{Latin-2} (ISO @w{8859-2})
@pindex latin2.tmac
-To use ISO @w{Latin-2}, an encoding for Central and Eastern European
-languages, invoke @w{@samp{.mso latin2.tmac}} at the beginning of your
-document or supply @samp{-mlatin2} as a command-line argument to
+To use ISO 8859-2, aka @w{Latin-2}, an encoding for Central and Eastern
+European languages, invoke @w{@samp{.mso latin2.tmac}} at the beginning of
+your document or supply @samp{-mlatin2} as a command-line argument to
@code{groff}.
@item latin5
@@ -5544,8 +5545,8 @@
@cindex ISO @w{8859-9} (@w{Latin-5}), input encoding
@cindex input encoding, @w{Latin-5} (ISO @w{8859-9})
@pindex latin5.tmac
-To use ISO @w{Latin-5}, an encoding for the Turkish language, invoke
-@w{@samp{.mso latin5.tmac}} at the beginning of your document or
+To use ISO 8859-5, aka @w{Latin-5}, an encoding for the Turkish language,
+invoke @w{@samp{.mso latin5.tmac}} at the beginning of your document or
supply @samp{-mlatin5} as a command-line argument to @code{groff}.
@item latin9
@@ -5554,9 +5555,9 @@
@cindex ISO @w{8859-15} (@w{Latin-9}), input encoding
@cindex input encoding, @w{Latin-9} (ISO @w{8859-15})
@pindex latin9.tmac
-ISO @w{Latin-9} succeeds @w{Latin-1}; it includes a Euro sign and better
-glyph coverage for French. To use this encoding, invoke @w{@samp{.mso
-latin9.tmac}} at the beginning of your document or supply
+ISO 8859-9, aka @w{Latin-9} succeeds @w{Latin-1}; it includes a Euro sign
+and better glyph coverage for French. To use this encoding, invoke
+@w{@samp{.mso latin9.tmac}} at the beginning of your document or supply
@samp{-mlatin9} as a command-line argument to @code{groff}.
@end table
Внимание!
I have not actually previewed this!
Truth be told, info(1) is Greek to me. I've tried
$ info groff.texi #, which made it say "Cannot find node 'Top'." at
the bottom (pun intended?), and then I couldn't figure out how to
actually view the groff info manual. Not that I've tried much, but
still.
IMNSHO it is incredibly ironic, and--if one could hurt a program's
feelings--almost insulting for groff's manual to be maintained in info
format. Not exactly dogfooding, no? At the peril of slighting the
local champion, my opinions on info(1) reduce to <xkcd.com/912>, and I
suspect
$ info mcas
is a synonym for
$ kill -9 346 #,
and in light of his prescience, I remain unconvinced *Primer* wasn't
based on the exploits of one Randall Munroe + colleague.
>> which makes me wonder if Current Year's
>> groff/troff itself (absent pre-piped converters) can at all handle
>> multi-byte character sets in general, or UTF-8 in particular.
>
> It cannot. This is a longstanding wishlist item: "improving Unicode
> support" was put into the Groff Mission Statement when it was drafted
> 10 years ago. Ten years before that, groff's then-maintainer posted
> to this list: "Volunteers are highly welcome to extend groff from 8bit
> to 32bit input characters"
Based on my admittedly not quite unlimited insight into Unicode
issues, if taken literally, a mission statement "to extend groff from
8bit to 32bit input characters" strikes me as an already outmoded if
not stillborn strategy. It might be much better to go all-in on
variable-width encoding, read: UTF-8, just like everybody else.
Whatever limited *strictly internal* use there may still be for UTF-32
in some buffers, structs or variables, anything not UTF-8 is probably
best kept to a minimum.
But perhaps I'm barking at shadows here. Nothing in this
<https://lists.gnu.org/r/groff/2004-05/msg00074.html> is smoking-gun
evidence that would compel a jury of me, myself and I to conclude
Werner et al. WEREN'T aware of that already, or if not then, then
certainly now.
> (http://lists.gnu.org/r/groff/2004-05/msg00026.html).
>
> But this is a monumental task, and one groff developer has written of
> some of its difficulties
> (http://savannah.gnu.org/bugs/?40720#comment4).
I was a few paragraphs into that before I realised the author of the
above comment is Ingo Schwarze, an OpenBSD dev I've previously talked
to, and whose judgement on this I trust A LOT.
> In short, it's not for lack of desire that groff lacks this feature.
>
> With any luck, you'll follow the Branden Track, where you start off by
> poking a little at groff's documentation and are soon hacking away at
> the code base. You might be the volunteer Werner asked for 20 years
> ago ;-)
Not to be a negative Nancy, but just to be straight with you and set
expectations: Probably not. Even if I, at long last, might yet prove
competent enough to make a significant contribution in code to the
open source community, I am less likely to make that to a GNU GPL
project -- I'm more of a BSD (ISC/OpenBSD) fan. Of course, to my
understanding it's not BSD licenses that are incompatible with GPL
ones, so any contribution could still reach you regardless of
philosophical differences if not legalistic bikeshedding.
I really only dove into the groff manual thanks to an observed
(kernel.org) ascii(7) man page bug I only have a partial fix for,
which is why I'm still reading, all of which I'll possibly talk about
at a later date.
>> Also, this sounds a lot like Current Year's groff(1) even WITH
>> pipe-connected UTF-8 converters/drivers (which may be what's referred
>> to at the bottom of that section) couldn't actually support anything
>> like, say, Cyrillic or katakana or whatever,
>
> Groff added Cyrillic support last year
> (http://savannah.gnu.org/bugs/?63076). It includes some CJK support
> but expanding this is an ongoing project
> (http://savannah.gnu.org/bugs/?62830). If you have expertise in this
> realm and can address some of the outstanding questions in that
> ticket, please chime in.
I'm not totally ignorant of UTF-8 in particular, but depending on your
expectations, I'm possibly also not so hugely competent for the former
to be a massively modest understatement.
I will say that if anyone following along at home is struggling to get
their head around UTF-8, this post by Graham Douglas might be an
excellent starting point:
<http://www.readytext.co.uk/?p=1284>
Thanks and regards,
Ian
(Ian Ropers)
- Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, ropers, 2024/03/07
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, Dave Kemper, 2024/03/07
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings,
ropers <=
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, G. Branden Robinson, 2024/03/07
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, ropers, 2024/03/08
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, Dave Kemper, 2024/03/08
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, G. Branden Robinson, 2024/03/08
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, ropers, 2024/03/08
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, G. Branden Robinson, 2024/03/08
- Re: Groff UTF-8 support? - Groff documentation section 5.1.9 Input Encodings, Dave Kemper, 2024/03/09
- quick and (not so) dirty applications of groff's build system (was: Groff UTF-8 support? ...), G. Branden Robinson, 2024/03/09