groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] hyphen vs. minus sign


From: Werner LEMBERG
Subject: Re: [Groff] hyphen vs. minus sign
Date: Sat, 15 Sep 2007 11:11:48 +0200 (CEST)

> The groff_char.7 documentation also lists the backquote u0040 and
> the apostrophe u0027.

Ah, yes.  Silly me.

> Adding this to unicode.tmac fixes these:
> 
> .char ` \[oq]
> .char ' \[cq]

OK.

> IIRC, in the C++ code, the font handling for the html and utf8
> devices is the same. Therefore I tried to add to html.tmac:
> .mso unicode.tmac
> and this fixes it!

OK, done.

> One problem is still left: What is now the recommended way to write a
> shell command line, in a way that is copy&pastable from at least the utf8
> and html outputs?
> - If I write "foo --help"   in the utf8 output we get twice u2010.
> - If I write "foo \-\-help" in the utf8 output we get twice u2212.

A very good question.  The standard solution is described in the
PROBLEMS file:

  * The UTF-8 output of grotty has strange characters for the minus,
    the hyphen, and the right quote.  Why?

  The used Unicode characters (U+2212 for the minus sign and U+2010
  for the hyphen) are the correct ones, but many programs can't search
  them properly.  The same is true for the right quote (U+201D).  To
  map those characters back to the ASCII characters, insert the
  following code snippet into the `troffrc' configuration file:

  .if '\*[.T]'utf8' \{\
  .  char \- \N'45'
  .  char  - \N'45'
  .  char  ' \N'39'
  .\}

However, this is an ugly hack and doesn't solve the very issue.  With
the current means this problem is unsolvable, I believe.

> - If I write "foo \[u002D]\[u002D]help" then in the utf8 output
>   we get twice u002D, as desired, but in the html processing I get
>   "warning: can't find special character `u002D'". Hmm??
>
> It took me already some effort to convince the Linux manpages
> maintainer that \- should be used for copy&pastable commands in
> manpages. Do I have to recommend him to use \[u002D] now instead?

Using \[u002D] doesn't work with the latin1 device...

I see two possible solutions.

  . Define a new grotty (pseudo) font `CR' which is the same as all
    other fonts but contains an additional line

      \-  24  0  0x002D

    This is the solution which Gaius has implemented for grohtml
    already (however, he always uses 0x002D for \-, not only for
    fixed-width fonts -- something which should probably be changed).

    I can imagine that most man pages already use \f[CR] for
    displaying verbatim stuff (groff man pages being a notable
    exception), so this should be rather straightforward.

  . Introduce a new escape, say, `\=', which maps to U+002D.  We would
    thus have

       -   U+2010
       \-  U+2212
       \=  U+002D

    Alternatively, we could exchange the meaning of \- and \=, having

       -   U+2010
       \-  U+00AD
       \=  U+2212

Sigh.  How do other applications solve this mess?


     Werner




reply via email to

[Prev in Thread] Current Thread [Next in Thread]