[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] ASCII dash in UTF-8 locale
From: |
Tadziu Hoffmann |
Subject: |
Re: [Groff] ASCII dash in UTF-8 locale |
Date: |
Sat, 24 Jan 2015 01:41:08 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
> > Heirloom troff and groff both render \- as en dash,
> > not minus sign, in PDF output.
> If you use groff's native pdf driver (-Tpdf) I believe
> minus is rendered, can be searched for and copy/pasted.
> The postscript driver also outputs a "minus" so I suspect
> it is the ghostscript conversion to pdf which is changing it.
Here on my system, ghostscript keeps the minus when converting
to PDF. The input file
.sp 3c
minus: \-
.br
en-dash: \(en
when processed by groff (using the default -Tps) and converted
to PDF using ghostscript results in the following page content
in the PDF:
10 0 0 10 0 0 cm BT
/R7 10 Tf
1 0 0 1 72 744.851 Tm
(minus: <AD>)Tj
12 TL
(en-dash: \211)'
ET
where the <AD> is a single byte, matching groff's "text.enc"
that says minus is to be encoded at position 173. The font
"R7" is a Times-Roman subset with the encoding
/BaseEncoding /WinAnsiEncoding
/Differences [ 137 /endash 173 /minus ]
Acroread (version 9) clearly renders the minus and the en-dash
differently.
When copied and pasted in a UTF-8 locale, it delivers them
as <e28892> and <e28093>, i.e., 'MINUS SIGN' (U+2212) and
'EN DASH' (U+2013).
In an ISO8859-1 locale both (like the hyphen) are pasted
as <2d>, i.e., "hyphen-minus".
If you want cut-and-pasteable ASCII command lines in PDF files,
I think the easiest way is to set up a hacked "code" font with
renamed glyphs. Alternatively, you can try adding a
GlyphNames2Unicode dictionary.