[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 1.23: UTF-8 device produces mysterious characters
From: |
G. Branden Robinson |
Subject: |
Re: 1.23: UTF-8 device produces mysterious characters |
Date: |
Mon, 12 Sep 2022 09:46:41 -0500 |
Hi Steffen,
At 2022-09-12T15:43:00+0200, Steffen Nurpmeso wrote:
> I have problems with the UTF-8 device, it shows
>
> on‐main‐loop‐tick
> instead of
> on-main-loop-tock
>
> ie U+2010 instead of hyphen-minus U+002D.
>
> The above does not feel right, and searching is impossible!
> I would expect U+2010 HYPHEN in hyphenation, but not as a regular
> combiner aka delimiter joined words as are used very often in
> German, for example.
There are a few points to raise about this. The first is a question.
1. You don't expect a hyphenated word to use a hyphen?
2. This is not a "1.23"-specific issue as your subject lines suggests.
$ groff --version | head -n 1
GNU groff version 1.22.4
$ echo 'long-term' | groff -Tutf8 | od -c
0000000 l o n g 342 200 220 t e r m \n \n \n \n \n
0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
*
0000100 \n \n \n \n \n \n \n \n \n \n \n \n \n
0000115
3. If you're secretly in a man page context but didn't disclose that,
then, yes, this is a change from groff 1.22.4. The hyphen-minus,
neutral apostrophe, and grave accent no longer map differently for
man(7) and mdoc(7) than for any other macro package. (\- still does
and there is no prospect of that changing, since there is no *roff
special character defined for the "ASCII hyphen-minus", and it is
essential to express this precise character in man pages. These
issues have been discussed at some length on this mailing list over
the past three years.)
4. "on-main-loop-tick" doesn't look a natural language word to me--it
looks like an identifier in a programming language (maybe some
dialect of Lisp). If that is the case, those hyphens need to be
spelled "\-" in the source code. This has always been true in man
pages, going back to 1979.
Take
$ grep '\\-[A-Za-z]' ~/src/unix/v7/usr/man/man1/bc.1
.B \-c
.B \-l
.B \-l
.B \-l
.B \-c
for example.
5. Searching is not impossible.
5a. Searching for a word that is broken and hyphenated across lines
is no more impossible than it always was. On occasions when I
have to do this, I break out sed(1) or perl(1).
5b. Literals that might be of interest in man pages should be
entered with hyphenation suppressed in the input. The groff man
pages in 1.23 do this much more conscientiously than in past
releases. This is to avoid confusing users who might wonder if
a hyphen is to be interpreted literally or not.
5c. You can disable automatic hyphenation altogether when rendering
man pages. See the '-rHY' option in groff_man(7). This feature
has been around for many years.
5d. groff's mdoc(7) implementation did not recognize the `HY`
register in groff 1.22.4 and earlier. It does now, though.
5e. For me, anyway, searching within less(1) using the pattern with
a dot where the hyphen goes works fine, even though there are 3
bytes in the input stream instead of one. Evidently less(1) is
smart enough. For instance, I can match "line-ending" in the
roff(7) page while paging it with "groff -Tutf8 -man | less -R"
by entering "/line.ending" within less(1).
I hope this clears some things up.
Regards,
Branden
signature.asc
Description: PGP signature
- 1.23: UTF-8 device produces mysterious characters, Steffen Nurpmeso, 2022/09/12
- Re: 1.23: UTF-8 device produces mysterious characters,
G. Branden Robinson <=
- Re: 1.23: UTF-8 device produces mysterious characters, Steffen Nurpmeso, 2022/09/12
- Re: 1.23: UTF-8 device produces mysterious characters, Dave Kemper, 2022/09/12
- Re: 1.23: UTF-8 device produces mysterious characters, Steffen Nurpmeso, 2022/09/13
- Re: 1.23: UTF-8 device produces mysterious characters, Dave Kemper, 2022/09/13
- Re: 1.23: UTF-8 device produces mysterious characters, Steffen Nurpmeso, 2022/09/13
- Re: 1.23: UTF-8 device produces mysterious characters, Dave Kemper, 2022/09/14
- Re: 1.23: UTF-8 device produces mysterious characters, G. Branden Robinson, 2022/09/13
- Re: 1.23: UTF-8 device produces mysterious characters, Steffen Nurpmeso, 2022/09/13
- Re: 1.23: UTF-8 device produces mysterious characters, Dave Kemper, 2022/09/13
- Re: 1.23: UTF-8 device produces mysterious characters, Ralph Corderoy, 2022/09/13