groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Broken chars


From: Werner LEMBERG
Subject: Re: [Groff] Broken chars
Date: Sun, 20 Mar 2005 08:48:33 +0100 (CET)

> I have observed that groff changes certain characters in manpages.
> [...]
> 
> For example, in perlcheat.1, the $| is changed into '$' and a
> vertical bar if the locale is UTF-8.
> 
> real:   | (U+0x7C)
> output: │ (U+0x2502)
> 
> This also to a number of other characters, including the backward
> apostrophe (accent grave) ` (0x60) which is transformed into ‘
> (U+0x2018).  This is very bad for copy+paste, and if your screen
> font does not have all the UTF8 characters (especially the case on
> bare 80x25 tty1 terminal), it does not even show any apostrophe, but
> a block to indicate that 0x2018 is not available in this font.
> 
> Is this a (big) bug in groff, or intention?

In

  usr/[local/]share/groff/<version>/font/devutf8/R

you can see which output codes are used for which input characters.
Looking into perlcheat.1, you can find this (converted on my platform
with Pod::Man v 1.37):

  .tr \(*W-|\(bv\*(Tr

The .tr request translates characters.  In this particular case, it
translates `|' to `\(bv'.  `bv' is equivalent to `braceex' in PS
output, and is by default mapped to U+23AA.  I have no idea why you
get U+2502 instead.  And I have no idea why Pod:Man uses `bv' at all.

Regarding the grave accent mapped to U+0x2018, here is the comment
from groff_char(7):

   `  the ISO Latin-1 `Grave Accent' (code 96) prints as <U+2018>,
      a left single quotation mark; the original character can be
      obtained with `\`'.

   '  the ISO Latin-1 `Apostrophe' (code 39) prints as <U+2019>, a
      right single quotation mark; the original character can be
      obtained with `\(aq'.

For typesetting this is the right choice, since those two character
are used this way normally, similar to TeX.

Distributions can overwrite this.  For example, in my SuSE 9.1, I have
this in /usr/share/groff/site-tmac/tmac.andocdb:

  .if '\*[.T]'utf8' \{\
  .  char \- \N'45'
  .  char  - \N'45'
  .  char  ' \N'39'
  .\}

To summarize:

  . Mapping `|' to the `bv' entity is strange.  If you use a plain `|'
    in a troff input file, you actually get a plain `|'!  This looks
    like a bug in Pod::Man.

  . The ` and ' characters in groff input files always indicate left
    and right single quotation marks.  U+0060 and U+0027 can be
    accessed as \` and \(aq.  Ideally, this is fixed in Pod::Man too,
    if you use a `verbatim' mode, by translating those characters
    temporarily.  Otherwise, as shown above, this can be changed in
    the configuration file of the man macros.


   Werner


PS: Why the heck is `perlcheat.man' and all other non-program man
    pages of perl in man section 1?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]