[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Issue in man page ascii.7
From: |
G. Branden Robinson |
Subject: |
Re: Issue in man page ascii.7 |
Date: |
Mon, 5 Dec 2022 08:06:26 -0600 |
Hi Alex,
At 2022-12-05T13:35:42+0100, Alejandro Colomar wrote:
> On 12/5/22 09:15, G. Branden Robinson wrote:
> > [[The fix]] would be something like this:
> >
> > -3: # 3 C S c s 3: ! + 5 ? I S ] g q {\n"
> > +3: # 3 C S c s 3: !\& + 5 ?\& I S ] g q {\n"
> > -6: & 6 F V f v 6: $ . 8 B L V \\` j t \\(ti\n"
> > +6: & 6 F V f v 6: $ .\& 8 B L V \\` j t \\(ti\n"
>
> Thanks!
You're welcome, but I think we might have talked past each other below.
> Sure, I try to do it consistently. If I Cc you is a "just read it if
> you want, not forced, maybe you're busy and someone else on groff@
> picks it up". :)
Works for me. :)
> > what's going on here
[the problem that Helge reported]
> > is actually a GNU tbl(1) bug.
> >
> > https://savannah.gnu.org/bugs/?61909
> I think I'll keep this as a WONTFIX.
>
> The man-pages don't have stable releases (i.e., what you get at the
> time your distro releases is what you'll get forever), so stable users
> will have this bug unfixed forever until they dist-upgrade, even if I
> fixed it.
>
> Soon (we hope), groff 1.23.0 will be released, so next OS releases
> (e.g., Bookworm) won't have this bug (and many others that you fixed).
>
> So, the only problem is for those who use stable distros, but somehow
> install the fresh man-pages.
No, that is not the case. Because there _aren't_ dummy characters \&
after the sentence ending punctuators [!?.] that are followed by
multiple space characters in the ascii(7) page today, _and_ every known
released version of GNU tbl incorrectly applies the configured
inter-sentence space to the second space character after such
punctuators, people are getting incorrect output _now_ from this table,
and any others that regex-match "[.!?] " in ordinary text blocks if
their configured inter-sentence space amount is not the default.
That last condition is in fact common for non-Anglophone users of groff.
Let me show you a simple exhibit and then I'll drown you with more
background.
---snip---
$ cat EXPERIMENTS/iss.man
.TH foo 1 2022-12-05 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
.TS
L.
Foo. Bar.
.TE
.ss 12 0
.TS
L.
Baz. Qux.
.TE
.TS
L.
Hep.\& Sid.
.TE
$ nroff -t -man EXPERIMENTS/iss.man # groff 1.22.4 (Debian)
foo(1) General Commands Manual foo(1)
Name
foo - frobnicate a bar
Description
Foo. Bar.
Baz. Qux.
Hep. Sid.
groff test suite 2022‐12‐05 foo(1)
$ ./build/test-groff -t -man -Tascii EXPERIMENTS/iss.man # groff Git
foo(1) General Commands Manual foo(1)
Name
foo - frobnicate a bar
Description
Foo. Bar.
Baz. Qux.
Hep. Sid.
groff test suite 2022-12-05 foo(1)
---snip---
So, a table entry _lacking_ these dummy character escape sequences \& is
exposed to the old groff bug, which still exists in the wild on every
system until last week, I suppose. (This bug is not man(7)-specific.
It will affect any groff document regardless of macro package.)
Lengthy background
==================
It can be seen that the difference in output was prompted by this line.
.ss 12 0
The formatter's default is equivalent to this.
.ss 12 12
The function of the number "12" is not obvious here; it arises from
traditions of mechanical typography. But what it _means_ is, "put one
word space between each word and put one (additional) word space between
sentences on the same output line".
Yeah, but nobody should be manipulating the inter-sentence spacing in a
man page, right? Right. But, localization files...
$ git grep 'ss 12 0' tmac
tmac/cs.tmac:.ss 12 0
tmac/de.tmac:.ss 12 0
tmac/fr.tmac:.ss 12 0
tmac/groff_man.7.man.in:\&.ss 12 0 \e" See groff(@MAN7EXT@).
tmac/it.tmac:.ss 12 0
tmac/sv.tmac:.ss 12 0
Not to mention the fact that this request could appear in a troffrc or
man.local file. In short, this is a user-configurable parameter and a
portable man page should not assume the inter-sentence spacing amount.
\& works to hide the bug even on old (well, current :-/ ) GNU tbl
because it suppresses the detection of sentence endings altogether.
\& does have other semantics in tbl(1) tables; it is used to align
the units place in columns using a numeric format (classifier "N" rather
than "L" or "C", for instance), but I've never in my life seen that
format used in a man page. (It is also hard to grep for without gagging
on false positives.) But, in principle, telling people just to work
around the bug by adding \& in _all_ circumstances is a bad idea for
this reason.[1]
There's a lot of bloody history around inter-sentence spacing, enough
that we have to cover the subject in the groff Texinfo manual,[2] and it
is compounded by luminaries like the general editor of the Chicago
Manual of Style lying to the public about that history. groff maintains
compatibility with AT&T troff in this area.
In Europe, supplemental inter-sentence space is _not_ common, and I
gather there is some kind of official European Union style guide that
militates against it. It is binding only upon official EU publications,
but many organizations have adopted it nonetheless--it saves the expense
of maintaining a style guide of one's own, and plenty of people
in the U.K. who voted for and celebrate BrExit nevertheless slavishly
follow EU prescriptions in this area.
> That can be random people that install random packages from source, or
> contributors to the pages. For both of them, I specify the
> dependencies in the INSTALL file, so I hope they don't blame me too
> much; they should ask their distributor about backporting groff 1.23.0
> for installing the pages from source, or install groff from source, or
> be happy with small glitches like this :)
I understand if you don't want to mess with a belt-and-suspenders
approach, but I want to make sure you're making an informed decision. :)
> However, things like .MR concern me more.
Me too. I'm trying to contain my expectations because history is
replete with nice new features that suffered deaths of neglect.
(warning: inside baseball^W^Wgroff internals)
Right now even email and web URLs in man pages aren't hyperlinked in
PDF, and that's silly. So I'm trying to orthogonalize man(7) hyperlink
support so I can couple it to gropdf(1)'s "pdfmark" support.
Or I would be working on it, if the under-documented "pdfhref" macro
weren't structured to make it a pain in this ass. I guess whoever
designed that didn't expect someone to format link text in a diversion.
Also I discovered an exciting new (old) bug when formatting HTML. :(
Anyway, once that is done, I can integrate Deri James's cool trick for
converting "local" man page cross references into PDF bookmarks, so you
do something like, hypothetically,[3] produce a 380-page compilation of
60 man(7) and mdoc(7) documents that have hyperlinked cross-references
to each other, and present "man:blah(1)" hyperlinks for pages outside
that collection.
I might fail at orthogonalizing, but I'll do my damnedest to at least
get this _working_. ("groff 1.24: the same but with elegance"... :-| )
> I'd be happy doing some radical changes and requiring 1.23.0 as a bare
> minimum, and use MR right after the Bookworm release.
[insert Kang and Kodos clip]
> Hopefully that triggers backporting of groff; maybe you can do that as
> a future maintainer of the Debian package? :P
Maybe, if groff 1.23 proves not to have many surprising regressions,
that would be feasible, but I would prefer to delegate that sort of
task. Build a team wherever you can. A backport is more likely to
happen if groff 1.23 proves not to have many regressions from 1.22.4.
I've gone to considerable lengths to avoid that: I have automated test
#152 in my working copy now. (groff 1.22.4 had three.)
> > [1] (groff insider stuff)
>
> The parentheses in here help a lot with long messages :)
I fear "tl;dr" was coined around 1999 by people exposed to my emails.
Regards,
Branden
[1] tbl uses the _leftmost_ `\&` in a numerically formatted entry as the
alignment position. For instance, imagine a business that produced
formatted reports by accepting text input from a terminal^Wweb
form. Also assume that the report generator wasn't too fastidious
about tidying up that input.
.\" nroff -t | cat -s
.TS
tab(@);
C S
C S
L N.
Amy's Kennels
Boarded Animals, Week of 2022-12-05
Size@Name and check-in weight (kg)
Large@Max 25.6
\^@Sassy. 44.8
Small@Henrietta 6.24
\^@T. J. Peepers.\& (chinchilla) 3.03
.TE
This is not a _well_-designed table, but it is a _plausible_ one. Well,
almost.[4] But adding another \& later at the "real" position where the
decimal point should be aligned will not help, because the leftmost one
controls.
[2]
https://git.savannah.gnu.org/cgit/groff.git/tree/doc/groff.texi?id=aa20f5961cb0788e888180c57add5a452ce9d8d6#n4976
[3]
https://git.savannah.gnu.org/cgit/groff.git/tree/doc/doc.am?id=aa20f5961cb0788e888180c57add5a452ce9d8d6#n257
[4] I'd like to meet the web-form-using kennel service staffer who
knew to sneak *roff escape sequences into the input. But we all
know that failure to validate input is as common as street litter.
signature.asc
Description: PGP signature