bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character


From: G. Branden Robinson
Subject: Re: Tilde (~) in bash(1) is typeset incorrectly as Unicode character
Date: Wed, 26 Jul 2023 10:35:50 -0500

Hi Thomas,

At 2023-07-26T10:47:05+0200, Thomas ten Cate wrote:
> In the bash manual page (`man bash`), the ASCII tilde character '~'
> (0x7e) is replaced by the Unicode character '˜' (U+02DC SMALL TILDE):
> 
>     $ man bash | grep 'additional binary operator'
>                   An additional binary operator, =˜, is available,
> 
> The same happens for the use of ~ as a shorthand for the home
> directory. This makes the manual page incorrect, and difficult to
> search.
> 
> It looks like there is an ASCII tilde character in the man page's
> source code:
> 
>     $ gunzip -c /usr/share/man/man1/bash.1.gz | grep 'additional
> binary operator'
>     An additional binary operator, \fB=~\fP, is available, with the same
> 
> I don't know the first thing about groff, but `man groff_char`
> suggests that ~ is indeed rendered as "modifier tilde", and that one
> should write \(ti to obtain an actual tilde character.

I know a little about groff.  Your advice is fine for man pages that
target only groff[1] and/or mandoc[2], but not Heirloom Doctools
troff,[3] neatroff[4] or Plan 9 troff (in its original form or as
maintained in Plan 9 from User Space[5]), and not legacy implementations
descended from AT&T troff that are, as far as I can tell, unmaintained
by the few Unix System V vendors that still exist.[6][7]

Many projects don't need to worry about such extreme portability in
their man pages, but GNU Bash arguably does.  (I'm open to correction.)

Furthermore, in the *roff language itself, as originally implemented by
Joe Ossanna (and re-implemented by Brian Kernighan) there is no good
way to test for the existence of a special character.[8]

As a first stab at it, I'd divide the world into two camps: (a) groff
and mandoc(1), and (b) everything else, and not worry about (b).

The bash(1) man page has an extensive preamble already that still
includes a workaround for 4.3BSD(!), so adding a little bit to it to
accommodate systems developed since 1990 might not be too disruptive.

I'm attaching a straw man diff to the bash(1) page.  If Chet likes it,
I'm happy to prepare one against the bash devel branch.

bash(1) also attempts to select a font named "CW" in places, which is
another portability problem (it's a Unix System III [and later] troff
font name that was available on _some_ output devices).  But I'd like to
see how we get over this bridge before I try to cross that one.  :)

> I'm guessing the manpage is generated from texinfo, so if this is
> actually a bug in texinfo, feel free to forward this email to
> bug-texinfo at gnu.org.

I don't think that's actually true.  As far as I know, Chet maintains
Bash's Texinfo docs and man pages in parallel by hand.

Regards,
Branden

[1] https://www.gnu.org/software/groff/
[2] https://mandoc.bsd.lv/
[3] https://github.com/n-t-roff/heirloom-doctools
[4] https://github.com/aligrudi/neatroff
[5] https://github.com/9fans/plan9port

[6] HP-UX 11 appears to still ship an AT&T/DWB or System V troff.
    Solaris 10 does, but it is nearing end-of-life and Solaris 11
    replaced its troff (of similar lineage as HP-UX's) with groff.

[7] It is also not hard to make AT&T-descended troffs support the
    `ha` and `ti` special characters.  For instance, here's a patch to
    Documenter's Workbench (DWB) 3.3 troff's "Latin1" output device.

--- R.orig      2023-07-26 09:55:30.527340674 -0500
+++ R   2023-07-26 09:58:49.658662373 -0500
@@ -68,6 +68,7 @@
 bs     "
 ]      33      3       93
 ^      33      2       147
+ha     "
 ---    47      2       94
 ---    50      1       95
 `      33      2       96
@@ -101,6 +102,7 @@
 ---    20      2       124
 }      48      3       125
 ~      33      2       148
+ti     "
 ---    54      0       126
 \`     33      2       145
 ga     "

    But even after 30+ years since groff emerged on the scene, I'm not
    aware of a single such troff having done this.

[8] A clever *roff hacker could try using the output comparison operator
    and width computation escape sequence to measure of a candidate
    special character, but this would not be reliable.  The output
    drivers of AT&T device-independent troff appear to format
    unrecognized characters as blanks (putting horizontal motions on the
    output).  (groff does not, throwing an error diagnostic instead.)[9]
    But if a special character did exist and happened to be the same
    width as such a blank character, this test would produce a false
    negative.  Worse, on nroff-mode devices, including the terminal
    emulators that 99% of all man page reading is done, _all_ glyphs are
    the same width, so you'd get false negatives all the time.

[9] This is a groff/AT&T troff difference that I don't think is
    documented by groff.  Maybe I should fix that.

Attachment: bash.1.diff
Description: Text Data

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]