[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 1.23: UTF-8 device: more display oddities
From: |
G. Branden Robinson |
Subject: |
Re: 1.23: UTF-8 device: more display oddities |
Date: |
Fri, 16 Sep 2022 17:32:36 -0500 |
At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote:
> |Letting aside the hyphen-minus -> hyphen thing that i fixed for me
> |locally, there is also the problem that
> |
> | ` U+0060, GRAVE ACCENT, "backtick"
> |
> |is displayed as
> |
> | ‘ U+2018, LEFT SINGLE QUOTATION MARK
>
> Also
>
> ~ U+007E, TILDE
>
> is displayed as
>
> ˜ 02DC, SMALL TILDE
>
> which here sits at the height of an accent here, for example the
>
> ^ 005E, CIRCUMFLEX ACCENT
>
> Putting it all together it really looks totally odd here:
>
> i=`echo '~/home^run'`
>
> becomes
>
> i=‘‘echo ’˜/homeˆrun’‘’
>
> How is anyone supposed to document a sh(1)ell-style manual with
> mdoc(7) (i do not know about man(7)) with these settings?
By reading the manual, Steffen.
UTF-8 content follows.
groff_char(7):
...
On ISO systems, code points in the range 33–126 comprise a common
set of printable glyphs in all of the aforementioned ISO
character encoding standards. It is this character set and (with
some noteworthy exceptions) the corresponding glyph repertoire
for which AT&T troff was implemented.
...
The table below presents the seven exceptional code points with
their typical keycap engravings, their glyph mappings and
semantics in roff systems, and the escape sequences producing the
Unicode basic Latin character they replace. The first, the
neutral double quote, is a partial exception because it does
represent itself, but since the roff language also uses it to
quote macro arguments, groff supports a special character escape
sequence as an alternative form so that the glyph can be easily
included in macro arguments without requiring the user to master
the quoting rules that AT&T troff required in that context.
(Some requests, like ds, also treat " non‐literally.)
Furthermore, not all of the special character escape sequences
are portable to AT&T troff and all of its descendants; these
groff extensions are presented using its special character form
\[], whereas portable special character escape sequences are
shown in the traditional \( form. \- and \e are portable to all
known troffs. \e means “the glyph of the current escape
character”; it therefore can produce unexpected output if the ec
request is used. On devices with a limited glyph repertoire,
glyphs in the “keycap” and “appearance” columns on the same row
of the table may look identical; except for the neutral double
quote, this will not be the case on more‐capable devices. Review
your document using as many different output devices as possible.
┌──────────────────────────────────────────────────────────────────┐
│Keycap Appearance and meaning Special character and meaning │
├──────────────────────────────────────────────────────────────────┤
│" " neutral double quote \[dq] neutral double quote │
│' ’ closing single quote \[aq] neutral apostrophe │
│- ‐ hyphen \- or \[-] minus sign/Unix dash │
│\ (escape character) \e or \[rs] reverse solidus │
│^ ˆ modifier circumflex \(ha circumflex/caret/“hat” │
│` ‘ opening single quote \(ga grave accent │
│~ ˜ modifier tilde \(ti tilde │
└──────────────────────────────────────────────────────────────────┘
There is also the "Portability" section of groff_man(7) [groff 1.22.4]
or groff_man_style(7) [groff 1.23].
Several special characters are also widely portable. AT&T troff
did not define the reverse solidus or quotation characters listed
below, but any of its descendants, like Plan 9 or Solaris troff,
can support them by defining their glyphs in font description
files; see groff_font(5).
\- Minus sign or basic Latin hyphen‐minus. This escape
sequence produces the Unix command‐line option dash in the
output. “-” is a hyphen in the roff language; some output
devices replace it with U+2010 (hyphen) or similar.
\(aq Basic Latin neutral apostrophe. Some output devices
replace “'” with a right single quotation mark.
\(oq
\(cq Opening (left) and closing (right) single quotation marks.
Use these for paired directional single quotes, ‘like
this’.
\(dq Basic Latin quotation mark (double quote). Use in macro
calls to prevent ‘"” from being interpreted as beginning a
quoted argument, or simply for readability.
.TP
.BI "split \(dq" text \(dq
\(lq
\(rq Left and right double quotation marks. Use these for
paired directional double quotes, “like this”.
\(em Em‐dash. Use for an interruption—such as this one—in a
sentence.
\(en En‐dash. Use to separate the ends of a range,
particularly between numbers; for example, “the digits
1–9”.
\(ga Basic Latin grave accent. Some output devices replace “`”
with a left single quotation mark.
\(ha Basic Latin circumflex accent (“hat”). Some output
devices replace “^” with U+02C6 (modifier letter
circumflex accent) or similar.
\(rs Reverse solidus (backslash). The backslash is the default
escape character in the roff language, so it does not
represent itself in output. Also see \e below.
\(ti Basic Latin tilde. Some output devices replace “~” with
U+02DC (small tilde) or similar.
Or you can just do the brute force thing. From groff 1.23's "PROBLEMS"
file:
----------------------------------------------------------------------
* When viewing man pages, some characters on my UTF-8 terminal emulator
look funny or copy-and-paste wrong. Why?
Some Unicode Basic Latin ("ASCII") input characters are mapped to
non-Basic Latin code points in output for consistency with other output
devices, like PDF. See groff_man_style(7) and groff_char(7) for correct
input conventions and background. If you use the correct groff special
character escape sequences to input them, you will get correct output no
matter what device the input is formatted for.
However, many man pages are written in ignorance of the correct special
characters to obtain the desired glyphs. You can conceal these errors
by adding the following to your site-local man(7) configuration. The
file is called "man.local"; its installation directory depends on how
groff was configured when it was built.
--- start ---
.if '\*[.T]'utf8' \{\
. char ' \[aq]
. char - \-
. char ^ \[ha]
. char ` \[ga]
. char ~ \[ti]
.\}
--- end ---
You may also wish to do the same for "mdoc.local".
In man pages (only), groff maps the minus sign special character '\-' to
the Basic Latin hyphen-minus (U+002D) because man pages require this
glyph and there is no historically established *roff input character,
ordinary or special, for obtaining it when a hyphen and minus sign are
both separately available. To obtain a true minus sign, use the special
character escape sequences '\(mi' or '\[mi]'.
----------------------------------------------------------------------
Didn't I already share this information with you?
Hmm, yes, I did.[1]
Possibly groff_mdoc(7) could use a "Portability" section as well. I
happen to be in the midst of some major revisions of that page, but on
the other hand your refusal to read the documentation I have already
served up to you on a platter does nothing to supply motivation.
Typography isn't for everyone. There's always Markdown. It might
better fit the write-only attitude you have manifested in your
contributions to this mailing list.
Regards,
Branden
[1] https://lists.gnu.org/archive/html/groff/2022-09/msg00050.html
signature.asc
Description: PGP signature
Re: 1.23: UTF-8 device: more display oddities, Ralph Corderoy, 2022/09/17