groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 1.23: UTF-8 device produces mysterious characters


From: G. Branden Robinson
Subject: Re: 1.23: UTF-8 device produces mysterious characters
Date: Mon, 12 Sep 2022 09:46:41 -0500

Hi Steffen,

At 2022-09-12T15:43:00+0200, Steffen Nurpmeso wrote:
> I have problems with the UTF-8 device, it shows
> 
>   on‐main‐loop‐tick
> instead of
>   on-main-loop-tock
> 
> ie U+2010 instead of hyphen-minus U+002D.
> 
> The above does not feel right, and searching is impossible!
> I would expect U+2010 HYPHEN in hyphenation, but not as a regular
> combiner aka delimiter joined words as are used very often in
> German, for example.

There are a few points to raise about this.  The first is a question.

1.  You don't expect a hyphenated word to use a hyphen?
2.  This is not a "1.23"-specific issue as your subject lines suggests.

$ groff --version | head -n 1
GNU groff version 1.22.4
$ echo 'long-term' | groff -Tutf8 | od -c
0000000   l   o   n   g 342 200 220   t   e   r   m  \n  \n  \n  \n  \n
0000020  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
*
0000100  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
0000115

3.  If you're secretly in a man page context but didn't disclose that,
    then, yes, this is a change from groff 1.22.4.  The hyphen-minus,
    neutral apostrophe, and grave accent no longer map differently for
    man(7) and mdoc(7) than for any other macro package.  (\- still does
    and there is no prospect of that changing, since there is no *roff
    special character defined for the "ASCII hyphen-minus", and it is
    essential to express this precise character in man pages.  These
    issues have been discussed at some length on this mailing list over
    the past three years.)

4. "on-main-loop-tick" doesn't look a natural language word to me--it
   looks like an identifier in a programming language (maybe some
   dialect of Lisp).  If that is the case, those hyphens need to be
   spelled "\-" in the source code.  This has always been true in man
   pages, going back to 1979.

   Take
     $ grep '\\-[A-Za-z]' ~/src/unix/v7/usr/man/man1/bc.1
.B \-c
.B \-l
.B \-l
.B \-l
.B \-c
   for example.

5.  Searching is not impossible.

    5a. Searching for a word that is broken and hyphenated across lines
        is no more impossible than it always was.  On occasions when I
        have to do this, I break out sed(1) or perl(1).

    5b. Literals that might be of interest in man pages should be
        entered with hyphenation suppressed in the input.  The groff man
        pages in 1.23 do this much more conscientiously than in past
        releases.  This is to avoid confusing users who might wonder if
        a hyphen is to be interpreted literally or not.

    5c. You can disable automatic hyphenation altogether when rendering
        man pages.  See the '-rHY' option in groff_man(7).  This feature
        has been around for many years.

    5d. groff's mdoc(7) implementation did not recognize the `HY`
        register in groff 1.22.4 and earlier.  It does now, though.

    5e. For me, anyway, searching within less(1) using the pattern with
        a dot where the hyphen goes works fine, even though there are 3
        bytes in the input stream instead of one.  Evidently less(1) is
        smart enough.  For instance, I can match "line-ending" in the
        roff(7) page while paging it with "groff -Tutf8 -man | less -R"
        by entering "/line.ending" within less(1).

I hope this clears some things up.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]