groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Some statistics on man page macro usage


From: G. Branden Robinson
Subject: [Groff] Some statistics on man page macro usage
Date: Sat, 6 May 2017 00:30:49 -0400
User-agent: NeoMutt/20170113 (1.7.2)

Hi folks,

I wanted to gather some data on the relative frequencies of man macros,
so I did.

With my own Debian Stretch system's installed man pages as a population
to sample, here are my findings.

The information below is the macro name followed by the number of
occurrences across all installed pages, followed by a comment explaining
the macro, since some of them are a bit obscure.

TH : 6598       # title heading; mandatory
SH : 48773      # section heading
SS : 11633      # subsection
B  : 53372      # bold
I  : 30134      # italic
SM : 686        # small
BI : 9351       # alternating bold and italic
BR : 43565      # alternating bold and roman
IB : 340        # alternating italic and bold
IR : 13349      # alternating italic and roman
RB : 3349       # alternating roman and bold
RI : 2514       # alternating roman and italic
SB : 60         # alternating small and bold
LP : 3847       # new paragraph
P  : 2613       # new paragraph
PP : 67814      # new paragraph
TP : 55832      # tagged paragraph
IP : 64701      # indented paragraph
RS : 27185      # relative-indent start
RE : 27183      # relative-indent end
TQ : 607        # additional tag for tagged paragraph (groff_man(7))
EX : 652        # example begin (groff_man(7))
EE : 655        # example end (groff_man(7))
MT : 108        # "mail-to" (email address) begin (groff_man(7))
ME : 108        # "mail-to" (email address) end (groff_man(7))
UR : 264        # URL begin (groff_man(7))
UE : 276        # URL end (groff_man(7))
OP : 350        # option (groff_man(7))
SY : 82         # synopsis begin (groff_man(7))
YS : 72         # synopsis end (groff_man(7))
HP : 2845       # hanging paragraph
URL: 23         # URL (Linux man-pages man(7)) [1]
AT : 0          # use AT&T footer (groff_man(7))
BT : 0          # print footer string (groff_man(7))
DT : 26         # (go back to) default tabs (groff_man(7))
PD : 8295       # (set) paragraph distance (groff_man(7))
UC : 16         # use UCB footer (groff_man(7))
TS : 1400       # tbl(1) table start
TE : 1400       # tbl(1) table end
EQ : 6          # eqn(1) equation start
EN : 6          # eqn(1) equation end
PS : 3          # pic(1) picture start
PE : 3          # pic(1) picture end
IX : 60339      # "index entry"; spewed aggressively by pod2man
?? : 6722       # macro matching [0-9A-Z]{2} and not in above list

The standout result for me here is the staggering number of non-standard
macros I found.  At first I suspected a bug, but running the same script
over the git HEAD of the man-pages repo, that value was 0.  A closer
look revealed that most of these were pod2man's effusive emission of
".IX" entries.

Note that the pattern used to catch the "??" macros would _not_ detect
macros with digits or lowercase characters in the names, or macro name
lengths other than 2.  This made it easier to avoid counting actual
requests, and avoided counting mdoc macros as "unknown".

man(7)'s "URL" macro is barely used, and sees one-tenth the usage of
groff_man(7)'s UR/UE pair.  Even the man-pages project itself never uses
.URL, not even in its own page documenting it--instead, that very page
marks up URLs with UR/UE when it marks them up at all.  I submitted a
patch to them to un-document .URL so that it can be "retired" from the
domain of the man macro language.

What surprises you about the above list?

Detailed data, for the above and for the man-pages subset, are attached.

Regards,
Branden

Attachment: debian-stretch-box-man-page-stats.txt.gz
Description: application/gzip

Attachment: man-pages-stats.txt.gz
Description: application/gzip

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]