[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
man pages, defensive programming, and bibliographic formats
From: |
G. Branden Robinson |
Subject: |
man pages, defensive programming, and bibliographic formats |
Date: |
Sun, 26 Jan 2020 00:22:08 +1100 |
User-agent: |
NeoMutt/20180716 |
(was: 01/01: tbl(1): Note origin of tbl.)
Hi folks,
Ingo and I had an inadvertent off-list exchange. I thought I'd loop in
the other developers and interested parties.
At 2020-01-24T23:59:13+0100, Ingo Schwarze wrote:
> G. Branden Robinson wrote on Sat, Jan 25, 2020 at 02:40:40AM +1100:
> > Did you mean to send this privately?
>
> No, omitting <address@hidden> was an oversight.
> But better this way than the other way round and publicly post
> a private letter...
>
> > At 2020-01-23T22:54:19+0100, Ingo Schwarze wrote:
> >> Branden Robinson wrote:
>
> >>> +.SH "See Also"
>
> >> Quoting of .SH arguments is not needed.
>
> > I know. It's a delberate style choice of mine to make man(7) easier
> > to learn. When people give arguments to macro calls, they need to
> > be aware of whitespace. By getting into the habit of quoting or
> > escaping macro arguments containing whitespace, their knowledge will
> > migrate more easily between, say, .SH and .BR.
>
> I don't really buy that. Look at how macros behave with respect to
> white space in arguments.
>
> Macros that don't take arguments in the first place:
>
> .EX .EE .PP .TQ .YS .DT
>
> Macros that take only a single argument which cannot contain
> whitespace anyway:
>
> .MT .ME .RS .RE .SY .UR .UE .PD .UC
>
> More macros where the first argument cannot reasonably contain
> whitespace:
>
> .OP .TH .TP .HP .AT
>
> Macros where multiple arguments are treated symmetrically:
>
> .B .I .SM .SB .SH .SS
>
> Macros where multiple arguments alternate meaning:
>
> .BI (& five friends), .IP
>
> So, for almost all macros, you just don't need to worry about argument
> quoting at all. The macros .BI (& friends) and .IP are really the
> only two odd ones out, and people need to understand *why* these two
> are so unusual and why quoting is required when arguments contain
> whitespace for these two ususual macros.
>
> It makes nothing simpler to make people worry about whitespace on
> macro lines in general when for the vast majority of macros, it just
> doesn't matter at all.
My rebuttal to this is that while this is a good analysis of the
relative frequency of macro names which require careful whitespace
handling (i.e., quoting or escaping of whitespace in arguments) with
respect to the man(7) _namespace_, it is poorly representative of the
frequency with which the various man(7) macros are actually used.
Along with for the paragraphing macros, the font-styling macros are
among the very most frequently used.
I threw together a script to count up man(7) macro use in our corpus:
$ MANS=$(find !(build|EXPERIMENTS) -name "*.man"|sort)
$ man-macro-frequency-counter $MANS | sort -rn -k2
B: 7236
I: 5179
TP: 2512
BR: 2316
IR: 1431
P: 983
RE: 841
BI: 792
IP: 691
SH: 484
RS: 475
LP: 340
OP: 284
TQ: 261
EX: 231
EE: 231
RI: 221
SS: 209
SY: 181
PP: 174
RB: 130
YS: 127
IB: 98
UR: 65
UE: 65
TH: 61
MT: 52
ME: 52
SM: 0
SB: 0
The script is attached.
Note that I have not finished my project of cleansing the pages of
unnecessary font escapes, so some of the font-style macros are
under-counted.
> > It's for similar reasons that I do this:
> >
> > The mf macros \&.foo and \&.bar should not be called within a
> > \&.pp context.
> >
> > The zero-width space escapes on the first line are not necessary;
> > but it's a good habit to use them anyway, because what happens if
> > you recast and reflow the sentence in your text editor such that one
> > of those ends up starting a line?
>
> When semi-automatically transforming code, you need to check the
> result in any case.
I agree. Code defensively and then validate the output. :)
> > In my view, this sort of thing is not cargo-culting, but defensive
> > programming.
>
> In my view, unnecessary escaping just makes text harder to understand
> by making it look mysterious. How many people will think the
> superfluous escaping is actually somehow required? It's likely to
> cause fear, uncertainty, and doubt.
>
> How is the (arbitrary) rule "a dot needs escaping at the beginning
> and end of each word" easier to learn than the (accurate) rule "it
> needs escaping at the beginning and end of each input line"? They
> seem both the same difficulty to me, except that the accurate rule
> needs to be invoked far more rarely (less work and obfuscation)
> and also becomes obvious once you understand how request/macro and
> sentence end detection works, whereas the arbitrary rube is, well,
> arbitrary and needs yet another argument for understanding it even
> after understanding the root cause of the problem.
You and I disagree on this and I'd like to solicit the views of the
folks on the mailing list.
> > (1) The stylistic format of such bibilographic entries; and
>
> Sure. I would probably settle on a more standard form like
>
> Michael E. Lesk and Lorinda L. Cherry,
> Tbl -- A Program to Format Tables,
> AT&T Bell Laboratories, Murray Hill, 1989,
> http://doc.cat-v.org/unix/v10/10thEdMan/tbl.pdf
>
> or something like that, in any case starting with the authors, then
> the title, then the rest.
>
> > (2) now that I understand the basics of refer(1), suck all the
> > citations into an index file, ship it, and have our pages use it
> > where necessary.
>
> That sounds like overkill. Somethin like refer(1) becomes useful when
> you write many dozens of journal articles citing thousands of other
> articles. For a about ten to twenty documents citing less than a
> handful of sources each, setting up the machinery is more hassle and
> less flexible than doing it by hand. Also, it makes maintenance
> harder for people not used to refer(1).
I'm not wedded to point (2).
> I'm quite sure we don't want the installed manual pages to .so
> anything.
Agreed on that point. refer(1) doesn't make that necessary, though, as
I understand it.
> Would it be an improvement to automatically generate the final version
> of the manual pages in some way using a database? I doubt it.
Again, shouldn't be necessary.
> >> [ snipped some reasons why you want to annotate the citation ]
[I've added back in most of what that stuff was --GBR]
I have further commentary on the exchange below; I just want to present
it to the list to solicit views on source citation in our man pages.
>> > Commenting on cited books or articles in the SEE ALSO section is
>> > very rare, and it will almost never happen for more than one
>> > article in the same manual page. So there is really no need to set
>> > the reference itself as a list tag and the comment as a list body.
>> > It doesn't even look particularly good.
>> >
>> > So at the very least, we could just remove the .TP and the \cs.
>>
>> You're right that there's not much precedent here. The good news for
>> you is that I'm not settled on this format. I'd like to get cites to
>> all the important Bell Labs white papers into our man page corpus,
>> and then standardize two things:
>>
>> (1) The stylistic format of such bibilographic entries; and
>> (2) now that I understand the basics of refer(1), suck all the
>> citations into an index file, ship it, and have our pages use it
>> where necessary.
>>
>> > But i think going a step further is even better because nothing
>> > in the comment really matters:
>> >
>> > - gratis version: no need to say that, it's obvious when you can
>> > download it for free, and if you would have to pay for it,
>> > we would hardly include a URI, at least not without warning
>> > that it needs payment
>>
>> groff is a GNU project, and generally eschews non-free documentation,
>> except for historical/academic works, for which even RMS has some
>> tolerance. (The FSF won't distribute them, but it doesn't try to
>> pretend they don't exist, unlike some proprietary operating systems.)
>>
>> I think it's worth pointing this out so contributors know where a
>> manual might need a freely-licensed alternative to be written.
>>
>> > - from UNIX v10: no need to say that, it's said in the document
>> > itself and doesn't matter for the manual page at hand nor for
>> > the decision of a reader to look at it
>>
>> So many people seem to think that Research Unix stopped with V7 (or
>> maybe 32V) that I found it noteworthy.
>>
>> > - early implementation: misleading, because the implementation
>> > described in the cited document is almost exactly a decade
>> > younger than the original implementation
>>
>> ...and three times as many years have passed since those two
>> instants. :)
>>
>> > - Uriel Pereira: totally irrelevant, that website merely saved
>> > a copy of the document
>>
>> Something about my bibliographic instincts tells me I need to
>> characterize or describe the site being linked to somehow.
>
> I think i understand why you feel that some of those details may be
> interesting from a historical perspective, i like considering history
> myself. But here, we are not even talking about a HISTORY section.
> People look at SEE ALSO because they wonder how to use tbl(1), not
> because they wonder whether v10 was a great UNIX or what rms@' opinion
> on non-free documentation is or whether Uriel Pereira created some
> website before he died. I'd simply prefer to stay on topic as much as
> possible, in particular outside HISTORY.
>
> > Bottom line: please regard the exact layout and content of the
> > bibilographic information I'm adding as "in flux".
> >
> > To hammer out those question of content and style, let's loop the
> > list in on my points (1) and (2) above.
>
> Sure. Maybe you want to suggest something in a well-organized manner
> rather than me picking apart a mail?
>
> There is no hurry to re-fix tbl(1). That can easily be done once
> you have settled on a nice format.
>
> But note that i have very rarely, if ever, seen lists of references
> with annotations attached to each cited article. Even less so in
> manual pages: in a research journal, having two pages of citations
> at the end of an article may be useful or even required, but in a
> manual page, references ought to be kept concise.
Regards,
Branden
man-macro-frequency-counter
Description: Text document
signature.asc
Description: PGP signature
- man pages, defensive programming, and bibliographic formats,
G. Branden Robinson <=