[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mdoc(7) prologue regressions
From: |
Ingo Schwarze |
Subject: |
Re: mdoc(7) prologue regressions |
Date: |
Sun, 31 Jul 2022 21:14:03 +0200 |
Hi Branden,
G. Branden Robinson wrote on Sat, Jul 16, 2022 at 02:21:49AM -0500:
> At 2022-06-27T00:29:08+0200, Ingo Schwarze wrote:
>> The first issue i identified is a group of regressions in the
>> behaviour of the mdoc(7) prologue macros .Dt and .Os.
>> The regressions aren't particularly severe because all that i found
>> so far only trigger when the document uses these macros incorrectly.
>> All the same, i'd like to report them such that we can decide
>> whether we want to fix some or all of them.
>>
>> I suspect that this commit might be responsible but admit
>> that i did not prove this suspicion by testing right before
>> and right after the commit. I only tested that the behaviour
>> changed as described below from groff-1.22.4 to groff-current:
>>
>> commit a1e6c19176d38823d8dc6c9a619a493ca90bdca4
>> Author: G. Branden Robinson <g.branden.robinson@gmail.com>
>> Date: Sun Oct 3 23:15:12 2021 +1100
>>
>> [andoc,man,mdoc]: Fix Savannah #61266.
>>
>> Resolve problems in batch rendering of man pages to PDF arising from
>> entanglement of end-of-input traps, page location traps, continuous
>> rendering mode, and andoc's reloading of the (m)an and (m)doc packages.
>> [...]
> That commit was an immense pain to get "right". As I feared, my words
> from later in the commit message have come back to haunt me.
>
> Refactoring is needed: some macros and registers have misleading names,
> there is some code duplication in mdoc, and some of the trap management
> problems are solved in slightly different ways in man(7) and mdoc(7),
> perhaps unnecessarily. We also need some test scripts to protect us
> from regressions. But this fixes the rendering problems.
>
> I didn't do the regression tests. But it probably would not have
> occurred to me at the time to test the incorrect usage modes of the
> mdoc(7) macros.
>
> For all of these issues, I have the same pair of questions:
> is that a regression or just a difference?
That's a matter of interpretation.
I might call it a regression if the old behaviour is clearly
more useful than the new, and a mere difference otherwise.
> Is there a specification for this scenario?
No. The closest we have to a "specification" for mdoc(7) is
the manual pages in the mandoc and groff repositories, and
mdoc(7) for example saya:
The prologue, which consists of the Dd, Dt, and Os macros in that
order, is required for every document.
So forgetting or repeating them or mixing up their order is clearly
invalid syntax. Still, it might be useful to handle such syntax
errors gracefully. If we decide which behaviour we want, i can
fix up the mandoc code and the mandoc testsuite accordingly.
I expect changes in mandoc will be significantly easier than in
groff because coding in C is usually *much* easier than writing
code for identical funxtionality in roff(7).
>> 1. When there are two .Dt macros in the prologue, the last one used
>> to win, setting the page title, section number, and section title.
>> Now, the first one wins, setting these fields.
Not sure whether either of these is better. Maybe the question to
ask is: which one works better with having multiple manual pages
in the same input file? Not sure.
Would there be value in having duplicate .Dt behave in a way similar
to duplicate .TH? Not sure either.
>> 2. When a .Dt macro occurs in the body of the page (as opposed to
>> in the prologue), it used to be ignored. Now, it causes a
>> large number of blank lines in the output.
I would call that one a regression because the new behaviour is
clearly not useful and likely annoying for the user.
>> Both issue 1 and issue 2 can be seen with this test file:
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Dt/dupe.in?rev=1.4&content-type=text/plain
>>
>> 3. When the first .Dt macro comes late, the page title used to be
>> set to "UNTITLED". Now, it is set to the empty string.
I would also call that a regression because UNTITLED is more effective
for alerting the author that something is amiss, even if they failed
to notice the related error message.
>> Both issue 2 and issue 3 can be seen with this test file:
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Dt/late.in?rev=1.2&content-type=text/plain
>>
>> 4. If there is no .Dt macro at all, the page title used to be
>> set to "UNTITLED". Now, it is set to the empty string, see:
>>
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Dt/missing.in?rev=1.2&content-type=text/plain
Dito, old behaviour seems more useful to me.
>> 5. When the usual order of .Dt and .Os is exchanged,
>> the .Dt macro is now completely ignored, setting the page title
>> to the empty string and the section title to "LOCAL", see
>>
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Dt/order.in?rev=1.2&content-type=text/plain
Arguably a regression because ignoring page content merely because
macros are out of order is not very graceful.
>> 6. The same regression as for issue 5 occurs when there are two .Os
>> macros in the order .Dd .Os .Dt .Os, see
>>
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Os/dupe.in?rev=1.3&content-type=text/plain
Same argument as issue 5 above.
>> 7. When the .Os macro comes late - i.e. in the body of the page
>> rather than at the usual place in the prologue -
>> the header line now appears at that place in the middle of the
>> body and no longer at the top of the manual page where it belongs, see
>>
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Os/late.in?rev=1.2&content-type=text/plain
Also feels like a regression to me: having the page header line intermixed
with the page body does not feel very graceful.
>> 8. When the .Os macro is completely missing, the header line is no
>> loger printed at all, see
>>
>> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/regress/usr.bin/mandoc/mdoc/Os/missing.in?rev=1.2&content-type=text/plain
Even that feels like a regression to me. Wouldn't it be more useful
to treat missing .Os just like .Os without an argument, even if the
documentation says the macro is mandarory?
Do you agree with the above assessment?
Or which changes do you think we should regard as intentional?
>> The most severe issue is probably number 8 because forgetting the .Os
>> macro, or thinking it might be optional, might even happen in real-world
>> manual pages. The next most severe would then be issue 5 because
>> mixing up the order might also happen in practice.
>>
>> Number 7 is also somewhat unfortunate. While not quite as likely to
>> happen as putting the .Os macro at the wrong place *inside* the prologue,
>> the effect produced is very ugly. Similarly, issue 2 is unlikely
>> to occur in practice, but the effect is also very ugly.
>>
>> The remaining issues 1, 3, 4, and 6 are less severe. But in case we
>> decide that some of the more severe regressions need fixing, maybe
>> properly fixing them all might not cause that much extra work?
>> In any case, i thought listing them all might potentially be useful.
> I'll have a fresh look at my changes to groff mdoc in this commit and
> see if I can find good spots to recover more gracefully.
>
> I have to admit I'm tempted to either let these fail, or, if I can find
> a good place to stick a sanity-checking hook, simply refuse to render
> the page if these macros, documented as mandatory, are missing.
I'd adwise against throwing fatal errors, in particular when these
can be avoided. For users, fatal errors are extremely annoying.
If a broken manual pages somehow slipped through QA and made it into
a release and onto a user's system, it is so much better if the
user is still able to read it, even if formatting is somewhat
degraded. Even for authors, getting an error message and a rendering
is arguably more useful during documentation development than a fatal
error alone.
For that reason, i made sure mandoc(1) *never* throws a fatal error
after the input file has been opened. Of course, it may still throw
a fatal error for operating system level failures, for example if
the input file does not exist or malloc(3) fails.
> But, maybe replacing the missing content with shouty-caps stuff
> like "UNTITLED" and "LOCAL" suffices to clue the user in.
Additionally providing diagnostic messages would be even better.
But while the mandoc test suite makes sure that neither groff
nor mandoc formatted output changes unexpectedly and while it also
checks some mandoc messages, it does *not* attempt to validate groff
diagnostic messages. Given that you spend considerable effort on
improving groff diagnostics, adding regressions tests for these in
the mandoc test suite would be a bad idea.
> Perhaps also changing the fallback operating system string
> from "BSD" to "GNU" will more effectively agitate misusers of
> mdoc(7) into correcting their ways.
That shouldn't matter much because the fallback operating system
is typically set by whoever packages groff and not taken from
upstream. For example, the OpenBSD port of groff sets the
default operating system to "OpenBSD".
But given that groff is a GNU product, i certainly wouldn't
object to setting the default operating system to GNU upstream,
even tough the original implementation of mdoc(7) was a BSD product.
> (Relatedly, I don't understand why anyone thought it was a good idea for
> the volume titles for mdoc(7) man pages in a GNU project to all announce
> themselves as being from a BSD manual even if they're rendering man
> pages that have nothing to do with BSD. This name is _not_ derived from
> any argument to the `Os` macro, nor configured based on the build host's
> OS identity, but hard-coded in `doc-volume-operating-system`. If this
> string were made empty or deleted, the volume titles would exactly match
> those used by groff man(7).)
I suspect that has merely been carried forward from Cynthia
Livingston's original /usr/share/tmac/doc-common, which said:
.de Os
.ds oS Null
.if "\\$1"" \{\
. ds oS 4.4BSD
.\}
> Anyway, handling the repeated cases seems like it should be easier, by
> testing a flag register in each of .Dt, .Dd, and .Os, and clearing that
> register again as part of the end-of-input macro.
>
> groff man(7) in groff Git behaves pretty badly if `TH` is omitted,
> whereas groff 1.22.4 degrades much more gracefully.
>
> I am however loath to give up the (in my opinion) immense improvements
> I've managed to hammer into place for the rendering of multiple man(7)
> and mdoc(7) documents with one groff command.
Yes, i agree that's clear progress.
There might be lessons to learn for mandoc(1), too:
While
mandoc /usr/share/man/man1/*.1
works just fine,
cat /usr/share/man/man1/*.1 | mandoc
treats it all as one single giant manual page, which works quite
badly when there is a mixture of mdoc(7) and man(7).
That's a different topic, though. :-)
Yours,
Ingo