groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 1.23: UTF-8 device: more display oddities


From: G. Branden Robinson
Subject: Re: 1.23: UTF-8 device: more display oddities
Date: Fri, 16 Sep 2022 19:51:35 -0500

At 2022-09-17T01:00:26+0200, Steffen Nurpmeso wrote:
> G. Branden Robinson wrote in
>  <20220916223236.lmkf3brdwotdn2fd@illithid>:
>  |At 2022-09-16T23:56:58+0200, Steffen Nurpmeso wrote:
>  |> How is anyone supposed to document a sh(1)ell-style manual with
>  |> mdoc(7) (i do not know about man(7)) with these settings?
>  |
>  |By reading the manual, Steffen.
> 
> Ok, and you put a lot of effort in it in the last years.

I'd feel more appreciated if I saw more evidence of you reading it.

> But the point is: last week it looked _entirely_ different,

You chose last week to upgrade from a nearly eight year-old release.[1]

Did you read groff's NEWS file?

> and the locale has not changed!  The manual has not changed either.

I know for a fact that "the manual" has changed substantially since
groff 1.22.3.  I did a significant amount of work on groff documentation
prior to the 1.22.4 release.

Are you referring to some other manual?

> Just to remind you that the hyphen-minus -> hyphen change was commited
> in March _this_ year.

Yes.  After I spent 2+ years advocating it on this mailing list and, as
a small portion of my work, reviewing groff's own ~60 man pages for
correct glyph usage.

> So it you -- you are changing things backward incompatibly!

No, I am aligning things more closely between typesetters and terminal
devices, to reflect the increasing capabilities of terminal devices on
Unix systems since about the year 2000.

You can restore man pages to the appearance you desire by using the same
character encodings you did when you become accustomed to them: ASCII or
ISO Latin-1.  Yes, even using bleeding edge groff Git HEAD to format
them.

>  |UTF-8 content follows.
>  |
>  |groff_char(7):
>  ...
> 
> Please note again i am doing mdoc(7) here, not mom or ms or my own
> macros.

Using mdoc(7) is no reason not to read groff_char(7).  mdoc(7) is a
groff macro package.  It does not alter the syntax or repertoire of
groff special characters.

>  |There is also the "Portability" section of groff_man(7) [groff
>  |1.22.4] or groff_man_style(7) [groff 1.23].
>  |
>  |       Several special characters are also widely portable.  AT&T
>  |       troff
>  ...
> 
> But there is nothing special.

"Special character" is a piece of *roff terminology.  It is startling to
me that you are not already aware of this.

If you'd take a moment to refrain from your multiple expostulations of
"WOW!!!", catch your breath, and oxygenate your brain sufficiently to
read the groff_char(7) man page, you might learn this.

> Input characters are mapped away differently than before.

See above.

>   ...
>  |       \(ha   Basic Latin circumflex accent (“hat”).  Some output
>  |              devices replace “^” with U+02C6 (modifier letter
>  |              circumflex accent) or similar.
>  ...
>  |       \(ti   Basic Latin tilde.  Some output devices replace “~” with
>  |              U+02DC (small tilde) or similar.
> 
> But why?

Why what?  Why do "some devices replace"...?  That's Ingo's wording, if
I recall correctly, but the reason is that some output devices have
larger glyph repertoires than others.  This observation has been
commonplace to *roff users at least since Typesetter roff was written in
about 1972.

I don't think I'd use the term "replace"; every *roff output device
defines a mapping from characters to glyphs.  In this sense, every
character gets "replaced".  Maybe I'll adjust that wording.

> And furthermore: why -Tutf8 that lives on and with fixed-width
> monospace fonts in practically all cases.

I cannot parse this.  Please try to express yourself in standard
English.

> And why differently than before?

See above.

>  |Or you can just do the brute force thing.  From groff 1.23's
>  |"PROBLEMS" file:
> 
> But this changes manuals written over the last decades to
> something completely different, Branden.

Not correctly written man pages.

> I am coming from 1.22.3.  It looked entirely different last week.

You said this already.

> You cannot expect all those people to rewrite all their manuals

https://www.medicalnewstoday.com/articles/320844

I predict the level of effort for most pages to be minimal (some may not
require revision at all), and speaking as someone who has undertaken a
multi-year project to _rewrite documentation_ for groff specifically, I
am thoroughly persuaded that fixing glyph usage errors in man pages is
among the easiest revisions of documentation that a person can
undertake.  If you find this task too daunting, then I cannot help but
anticipate that much more significant flaws in your documentation will
go unaddressed.

The presence of incorrect glyphs is likely to frustrate copy-and-paste
operations, or look mildly strange, but is not, in most cases, going to
be a significant barrier to people trying to apply man pages because in
every case, ASCII glyphs are _easier to type_.

In any event I suspect most man pages will get fixed, if at all, because
readers will report bugs.  I've met too many software engineers to
expect that most of them, having once written a man page, will think it
worth their while to go back and review their work at intervals.

> because you feel like mapping monospace -Tutf8 to be en par with
> -Tpdf with all its font powers (used or not)?

You are revealing ignorance here.  Nothing about the UTF-8 character
encoding implies a fixed- or variable-width typeface.  UTF-8 is a method
for encoding integers.  Yes, just that.  The semantic value of an
integer thus encoded is a separate matter.

> I really do not understand these decisions.

They've been discussed at length on this mailing list.  You can search
its archives, or, if you've alienated Dave Kemper less than you have me,
perhaps he can point you to some relevant history.  Shall we lay odds on
whether you'd trouble yourself to read such materials if they were
offered you?

> Please note also mandoc (at least the version i have here) renders
> it the way i _expect_.

You might be disappointed when at some point in the future you get
around to updating your mandoc package, then.[2]

> Maybe there is a reason why now also Apple i think switches away
> from groff to mandoc?

Because Apple hates the GNU GPL and they hate GPLv3 even more than
GPLv2.  I'm confident that nothing I've done has anything to do with
their decision.  Moreover, Apple is a Fortune 50 company.  If you expect
any such firm to make technology selection decisions based primarily on
_technical_ merit, then you possess a naïveté that makes you a likely
mark for cryptocurrency scams.

>  ...
>  |* When viewing man pages, some characters on my UTF-8 terminal
>  |  emulator look funny or copy-and-paste wrong.  Why?
>  |
>  |Some Unicode Basic Latin ("ASCII") input characters are mapped to
>  |non-Basic Latin code points in output for consistency with other
>  |output devices, like PDF.  See groff_man_style(7) and groff_char(7)
>  |for correct
>  ...
> 
> Uh!

That's right.  I've researched AT&T troff glyph repertoires
extensively and documented many of my findings in one of the man pages,
groff_char(7), that you won't read "because you're an mdoc(7) user".

>  |However, many man pages are written in ignorance of the correct
>  |special characters to obtain the desired glyphs.  You can conceal
>  |these errors
> 
> Heh!  _Exactly_!

If you want to conceal them, conceal them.  I've given you the recipe
for doing so.  But don't expect groff to do so by default as long as I
have some influence on the matter.

> You know, if you would provide a commented-out setting to change
> the decade old default behaviour to what you feel is more modern,
> or "better", _then_ i could understand it.

I've done that for other things, like OSC 8 enablement, though by the
time groff 1.23.0 final is nigh, maybe fixed pagers will be preponderant
after all.

I didn't do it here because the main consequence is not to frustrate
anyone's understanding of man page text--few readers who see '˜' in a
man page example are going to type anything but '~'.

The main consequence is to expose problems in man(7) documents drafted
by authors who are ignorant of the broader typesetting issues.

And to draw reactionaries like yourself out of the woodwork who, despite
deliberately choosing a UTF-8 character encoding in their terminal
environments, would really prefer to see ASCII or ISO Latin-1.  I guess
except for colorful emojis or something.

> I mean i produce backward incompatible changes myself all the
> time, but i give plenty of hints.

I ask again, did you read groff's NEWS file when upgrading?

>  |--- end ---
>  |
>  |You may also wish to do the same for "mdoc.local".
>  |
>  |In man pages (only), groff maps the minus sign special character
>  |'\-' to |the Basic Latin hyphen-minus (U+002D) because man pages
>  |require this glyph and there is no historically established *roff
>  |input character,
> 
> This commit of yours is from March 2022, and it changes behaviour
> that was maybe stable for ~32 years,

First of all, UTF-8 did not exist 32 years ago, in 1990.[3]  The only
glyphs whose representations have changed are those serving multiple
functions in earlier character encoding standards and de-unified by
Unicode.

Secondly, the behavior that was "stable for 32 years" was consequent to
a change made in 2009 by Werner.

https://git.savannah.gnu.org/cgit/groff.git/commit/?id=98acc924f4e32cfc2209df5db0c21921df8cc7ac

So rather than undertaking a revolutionary act, I am restoring the
status quo ante.

Interestingly, I note that this change was made just prior to groff's
relicensing to GPLv3 and the 1.20 release.

https://git.savannah.gnu.org/cgit/groff.git/log/?ofs=5300

This throws new light on your petulant attempt to characterize Apple's
recent replacement of groff with mandoc as a response to my development
work.   It's more of a boomerang; the many Mac OS X/macOS users stuck on
groff 1.19 for years and beyond which the firm refused to tread would
have been seeing the very glyphs you're complaining about, other things
being equal.  (I don't have access to an Apple machine.  It's possible
Apple patched this, the terminal's locale was Latin-1 instead of UTF-8.)

> i have forgotten when mdoc was first seen.
> 
> To repeat: your change invalidates _all_ mdoc manuals ever written
> since mdoc(7) sprang into existence at the beginning of the 90s!
> 
> WOW!!!

I think Ingo can probably speak to these characterizations.

>  |Didn't I already share this information with you?
> 
> Well, i think i stop reporting such errors then.

Since much of your correspondence is replete with evidence that you
don't read, don't listen, and don't answer questions put to you by
people who are trying to help you,[4] that may not be a bad idea.

--Branden

[1] https://ftp.gnu.org/gnu/groff/
[2] https://savannah.gnu.org/bugs/?62494
[3] https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
[4] https://lists.gnu.org/archive/html/groff/2022-09/msg00061.html

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]