[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: About verbatim dashes in PostScript output
From: |
G. Branden Robinson |
Subject: |
Re: About verbatim dashes in PostScript output |
Date: |
Sat, 28 Oct 2023 08:57:51 -0500 |
Hi Jan,
At 2023-10-28T15:18:05+0200, Jan Engelhardt wrote:
> A recent LWN.net article <https://lwn.net/Articles/947941/> (paywalled
> for a while)
For the benefit of those reading this in the future, the article should
be free to read starting about 2 November 2023.
> pointed at https://bugs.debian.org/1041731 and the topic of
> "-" vs "\-".
>
> Given the following input:
>
> -\-\[u002D]\[u2013]\[u2014]+\[u2212]
>
> Feeding it through `groff -Tutf8`, I get
>
> ‐−-–—+−
> <U+2010><U+2212><U+002D><U+2013><U+002B><U+2014>
>
> groff_char(7) says \- maps to "minus sign/Unix dash". Ambiguous, but
> ok, it is what it is.
Yes. We're kind of trapped here; AT&T troff always documented `\-`
specifically and exclusively as a "minus sign". Not a "hyphen-minus" or
something like that. The "Unix dash" term might have been my invention
to try to advise the same people who aren't listening to me in that LWN
thread.
> Is there a better way though than to explicitly use \[u002D] to get a
> guaranteed U+002D?
Not a better one, no. (There's a worse one, involving `\N`.)
_Unless_ you're using man(7) or mdoc(7), your document can:
1. Remap \- to \[u002D] with `tr` or `char`; or
2. Define a string to interpolate \[u002D].
Man pages should not do either of these, because they will just make a
bad situation worse, causing more man pages to be inconsistent with each
other, Albert Cahalan-style.
> Second, I turn to PostScript output that is generated by
> `groff -Tps`. One observes:
>
> troff:<standard input>:1: warning: special character 'u002D' not defined
>
> (Converting the PS to PDF and opening that with evince), the rendered
> view shows a hyphen, a minus, an endash, an emdash, and another minus
> but rendered in a different vertical position which does not line up
> with the '+' sign.
Let's see, your input was...
> -\-\[u002D]\[u2013]\[u2014]+\[u2212]
That should be, in order:
a. a hyphen (U+2010)
b. a minus sign (U+2212) from the "current font" (likely a text font)
c. a hyphen-minus (U+002D)
d. an en dash (U+2013)
e. an em dash (U+2014)
f. a plus sign (U+002B) from the "current font" (likely a text font)
g. and a minus sign (U+2212) from the "special font".
A shorter way to say \[u2212] is \[mi] (or `\(mi`; it's a venerable
special character identifier going back to Ossanna troff).
GNU troff maps certain Unicode code points back to special characters
first.
https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/uniglyph.cpp?h=1.23.0#n392
groff_char(7) attempts to explain why all this "text font" and "special
font" business exists.
Notes describes the glyph, elucidating the mnemonic value of
the glyph name where possible.
[...]
Entries marked with “***” denote glyphs used for
mathematical purposes. On typesetting devices, such
glyphs are typically drawn from a special font (see
groff_font(5)). Often, such glyphs lack bold or italic
style forms or have metrics that look incongruous in
ordinary prose. A few which are not uncommon in running
text have “text variants”, which should work better in
that context. Conversely, a handful of glyphs that are
normally drawn from a text font may be required in
mathematical equations. Both sets of exceptions are
noted in the tables where they appear (“Logical symbols”
and “Mathematical symbols”).
Basic Latin
[...]
The vertical bar is overloaded; the \[ba] and \[or] escape
sequences may render differently. See subsection “Mathematical
symbols” below for special variants of the plus, minus, and
equals signs normally drawn from this range.
Mathematical symbols
[...]
Observe the two varieties of the plus‐minus, multiplication, and
division signs; \[+-], \[mu], and \[di] are normally drawn from
the special font, but have text font variants. Also be aware of
three glyphs available in special font variants that are normally
drawn from text fonts: the plus, minus, and equals signs. These
variants may differ in appearance or spacing depending on the
device and font selected.
...and the entire "History" section.
> Third, when one copy-pastes the string shown in evince, I get back:
>
> -−–—+−
> <U+002D><U+2212><U+2013><U+2014><U+002B><U+2212>
>
> I expected to receive:
>
> <U+2010><U+002D><U+2013><U+2014><U+002B><U+2212>
>
> so that copypasting commands from PS/PDF would work "right"
> similarly as it does for manpages when they use \-.
That is because \- is not a "hyphen-minus" (except in man pages, where
we are forced to remap it for practical reasons). The C/A/T typesetter
that the Bell Labs CSRC acquired didn't _have_ a "hyphen-minus" glyph.
It had a hyphen, a minus sign, and an em dash. So, to troff, \- is a
minus sign, and when you format `\-` when not using a man page macro
package, that is what you get.
If you add \[pl] to your list, _that_ plus sign's crossbar should line
up with the U+2212 minus sign, and if it doesn't, I'd be curious to see
the output of "groff -Tps -Z".[1] (But it's always possible for a font
to be buggy.)
Does this clear things up? Please tell me if there is anything not
making sense, or any way I can improve the groff_char(7) man page.
Regards,
Branden
[1] For me, it's doing what it should.
$ printf -- '-\\-\\[u002D]\\[u2013]\\[u2014]+\\[u2212]\\[pl]\n' | groff -Tps -Z
| tail
troff:<standard input>:1: warning: special character 'u002D' not defined
x font 11 S
f11
Cmi
h5490
Cpl
h5490
n12000 0
x trailer
V792000
x stop
signature.asc
Description: PGP signature