[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: .B, .I disable hyphenation?
From: |
G. Branden Robinson |
Subject: |
Re: .B, .I disable hyphenation? |
Date: |
Mon, 13 Sep 2021 03:27:50 +1000 |
User-agent: |
NeoMutt/20180716 |
Hi, Alex!
At 2021-09-12T14:56:39+0200, Alejandro Colomar (man-pages) wrote:
> Hi Branden,
>
> Usually, when a manual page highlights a term, either in bold or
> italics, it usually is a special identifier (macro, function, command
> name or argument), for which hyphenation can hurt readability and even
> worse, turn it into a different valid identifier.
>
> What about disabling hyphenation for .B and .I?
> Are there any inconveniences in doing so that I can't see?
The problem that arises is that the font styling macros are
presentational, not semantic, so it's hard to know whether someone is
using them for emphasis or to suggest syntactical information. This is
why you made a statistical argument ("usually").
I'm hesitant to adopt your suggestion, because if we did make this
change, it would be difficult to override in the other direction, e.g.,
the page author is thinking, "yes, I'm putting emphasis here--hyphenate
the words as necessary, darn it!". Because hyphenation had already been
suppressed, it would have to be manually added back in to every word in
the arguments to .B and .I. Most writers of English, for good reason,
cannot be bothered to learn where the hyphenation points are and
understandably leave that chore to a computer. (Moreover, U.S. and
Commonwealth English seem to apply different hyphenation rules.)
In my opinion it is easier, in terms of maintaining flexibility and
getting reliable results, to do what I do in the groff man page
corpus--disable hyphenation on a per-word basis when necessary. To be
concrete, I populated the shell variable "MANS" with the man source
document files in the groff tree, and then performed this grep.
$ grep '^\.[BIR]\([BIR]\) \\%' $MANS
I got 434 matches in groff Git HEAD. Here are 3 of them.
./src/utils/lkbib/lkbib.1.man:.IR \%@g@indxbib (@MAN1EXT@)
./src/utils/lkbib/lkbib.1.man:.IR \%@g@refer (@MAN1EXT@),
./src/utils/lkbib/lkbib.1.man:.IR \%@g@lookbib (@MAN1EXT@),
A whopping number of these are like the above: they are man page cross
references. The `MR` macro I've been talking about for (over?) a year
now would render this usage of \% unnecessary, because MR would be
semantic and we know we _never_ want to hyphenate the name of a man
page[1].
The manual suppression of hyphenation is not necessary if you know a
word won't be hyphenated. A trick that's been passed around on the
groff list is to have a shell one-liner handy that tells you all of the
automatic hyphenation points groff thinks a word has.
Here's the version of the "hyphen" script I use.
#!/bin/sh
for W
do
printf ".hy 4\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' | tr -d '\n'
echo
done
I don't have to remember or reason out which of "indxbib", "refer", or
"lookbib" will be hyphenated. I can ask groff.
$ hyphen indxbib refer lookbib
in‐dxbib
re‐fer
look‐bib
Yup, they all need hyphens, so a leading \% is advised. [Aside: What's
that "@g@" thing, you may ask? Like the man page section number, it is
not groff syntax, but fodder for a sed script that replaces it during
make(1) with the command prefix configured by the person who builds
groff. (When groff was first written in 1989-1991, it often had to
share a disk with a proprietary troff installation, and needed to stay
out of the latter's way.) Since I can't know at man page maintenance
time what the builder will choose for a prefix, I have to assume that it
is something that is hyphenable, and so I suppress its hyphenation.]
By contrast, I don't need to suppress hyphenation for the following.
[Aside: These command names also don't collide with historical troff
names, so they don't need the command prefix, either.]
./src/utils/tfmtodit/tfmtodit.1.man:.IR groff (@MAN1EXT@),
./src/utils/tfmtodit/tfmtodit.1.man:.IR grodvi (@MAN1EXT@),
In my view, this is really not much work; I spend much more time
thinking about and recasting at the sentence or paragraph level--or
realizing there's some concept that we haven't explained adequately at
all and drafting a presentation of it...and, for that matter, composing
emails like this one--than I do worrying about hyphenation points.
Nevertheless, I recognize that many contributors of man pages to the
Linux man-pages project are _profoundly_ uninterested in typography--in
fact they may have hyphenation disabled altogether in their man page
renderer[3]--and regard every single thing they are required to learn
about *roff or man(7) syntax as one more nudge in the direction of
Markdown or some other alternative that they imagine will deliver them
to an effortless utopia where documentation practically writes itself.
I acknowledge that placement of these hyphenation control escapes looks
tedious (and it is, slightly). If we want to fix this in the man(7)
macro language, then, in my opinion, the right way is to cross the
Rubicon and add more semantic macros. I have never forwarded a serious
proposal along these lines because I still have full-thickness burns
over 90% of my body from exposure to DocBook 25 years ago. The problem
that mortified me is that as soon as people get their hands on a
semantic tag they have, all too often, deployed it the highest
syntactical level of the implementation language they can locate. In
HTML, for example, that is the element.
If we had a pair of macros that meant "my argument is a keyword" or "my
argument represents user-replaceable text", respectively, then we could
easily and reliably solve the problem at the level you're tempted to.
(Though as a matter of fact, I would _not_ disable hyphenation for
nonliterals...why should we? They don't need to be copy and pasted
as-is--if they are, they get replaced anyway by definition--and
descriptive nonliterals are sometimes long[4], as anyone who's read a
few BNF grammars can attest.)
The smallest, tightest solution I have been managed to contemplate that
does not bloat the name space of man(7) is something along these likes.
.KW keyword [tag-space]
.VA metavar [tag-space]
Here is a straw-man example.
.KW strlen function
and
.KW strnlen
return
.KW size_t type
and take an argument
.VA s variable
that is expected to be a
.KW "const char *" type
...which would render as
strlen and strnlen return size_t and take an argument s that is expected
to be a const char *
"strlen", "strnlen", "size_t" and "const char *" would be styled as
directed elsewhere (with defaults in an.tmac or an-ext.tmac, but
possibly overridden in man.local to suite distributor or site tastes).
I wanted to end the example sentence with a period, but right away we
hit one of the problems that has prevented me from advancing this
proposal, which is the question of how to handle adjacent
punctuation...add yet another macro argument for it, or encourage usage
of the output line continuation escape \c, which historically terrifies
people? Support multiple optional arguments, and force people to learn
to quote empty macro arguments, an inconvenience that man(7) largely
already spares them from if they practice good style[5]?
In case it needs to be pointed out, I think it's impractical for
man(7)--as a macro package--to prescribe descriptors for the "tag" name
space. mdoc(7) somewhat notoriously maintains large catalogs of a
proliferating number of BSD-descended operating system names and
releases, a source of ongoing tedium and maintainability
frustrations[6]. DocBook's attempt to boil this ocean is what drove me
away from it and I don't want to bloat groff man(7) with something
that's going to demand community consensus--and, I expect, some amount
of heated debate--to resolve. The virtues of _having_ a tag name space
are, I trust, well understood, and their availability is a point Ingo
takes some justified pride in with the support thereof in mandoc(1).
The Linux man-pages project is much better suited than the groff project
is to design and promulgate a set of canonical tags; to point out just
one blind spot, groff doesn't ship _any_ section 2 or 3 man pages,
whereas these sections are Linux man-pages' bread and butter (though the
long-neglected section 7 is looking better all the time and at last
fulfilling its decades-old potential).
I don't have answers to the questions I've raised, so in the meantime, I
practice the discipline of using the hyphenation control escape sequence
with the font style macros.
To conclude this epistle with some possible next steps to take, I
foresee a few possibilities.
1. Despair of popularizing this knowledge. Encourage people to continue
to do as they have always done, and trust more detail-oriented
contributors like yourself to clean up .B and .I calls with
hyphenation control escapes as required.
2. Teach people about correct usage of the \% escape in man-pages(7),
and remind contributors about this subject about as often as you have
to do regarding semantic newlines.
3. Lobby for a change to man(7) implementations as you originally
suggested. I know I've voiced some resistance to this idea, but your
bigger challenge may be getting a hold of any maintainers of
non-groff man(7) implementations to even field the proposal. On the
other hand, if groff and mandoc are all you care about, you've
already reached the right people. :)
4. Have Linux man-pages provide its own implementations of .B and .I to
do what you want. (Every Linux man-pages document could use the
`.so` request to load such overrides.) This might represent an
irreconcilable conflict between your project's needs and groff, and
I'm pretty sure no one wants to see that happen, but in the spirit of
frankness I have to point out that this is a possibility, and one
that may not have occurred to many Linux man-pages contributors.
5. Cross the Rubicon and develop semantic macros for man(7). The payoff
here is huge but the effort required will not be small.
(Implementation is not the hard part; socializing the change and
providing a smooth transition/deployment path for umpteen
distributors who won't ship Linux man-pages releases in synchrony
with any other particular thing will be much more challenging, I
predict. And that's not even counting the issue of standardizing a
lexicon for the tag name space.)
6. [ObIngoSchwarze: Switch to mdoc(7).]
Regards,
Branden
[1] Erlang developers may disagree.[2] :-|
[2] https://savannah.gnu.org/bugs/?43532
[3] Or would, if they knew it was possible. See the `HY` register in
the "Options" section of groff_man(7) or the `--nh` option of man-db
man(1).
[4] Here's an example from groff_font(5) in groff Git HEAD.
papersize format‐or‐dimension‐pair‐or‐file‐name
Set the dimensions of the physical output medium according
to the argument, which is either a standard paper format,
a pair of dimensions, or the name of a plain text file
containing either of the foregoing. Recognized paper for‐
mats are the ISO and DIN formats A0–A7, B0–B7, C0–C7, and
D0–D7; the U.S. formats letter, legal, tabloid, ledger,
statement, and executive; and the envelope formats com10,
monarch, and DL. Case is not significant for the argument
if it holds predefined paper types.
Alternatively, the argument can be a custom paper size in
the format length,width (with no spaces before or after
the comma). Both length and width must have a unit ap‐
pended; valid units are “i” for inches, “c” for centime‐
ters, “p” for points, and “P” for picas. Example:
“12c,235p”. An argument that starts with a digit is al‐
ways treated as a custom paper format.
Finally, the argument can be a file name (e.g., /etc/pa‐
persize); if the file can be opened, troff reads the first
line and attempts to match the above forms. No comment
syntax is supported.
More than one argument can be specified; troff scans from
left to right and uses the first valid paper specifica‐
tion.
[5] https://man7.org/linux/man-pages/man7/groff_man_style.7.html#Notes
[6] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=867123
signature.asc
Description: PGP signature