Re: add __MVS__ to list of systems that should default to UTF-8 in local

On Thu, Jun 1, 2023 at 8:23 AM Bruno Haible <bruno@clisp.org> wrote:

Hi Mike,

> It looks right but I do see 3 warnings:
>
> :troff: man7/groff_char.7:1051: warning: can't find special character 'bs'
> troff: man7/groff_char.7:1192: warning: can't find special character
> 'radicalex'
> troff: man7/groff_char.7:1195: warning: can't find special character
> 'sqrtex'

I get these warnings too, on a GNU system; so, you can ignore them.

> For example, in the Arrows section I see:
>
> Arrows
>
> l l l l lx. Output Input PostScript Unicode Notes _
> ← \[<-] arrowleft u2190 horizontal arrow left +
> → \[->] arrowright u2192 horizontal arrow right +
> ↔ \[<>] arrowboth u2194 T{ horizontal arrow in both direc‐
> tions T} ↓ \[da] arrowdown u2193 vertical arrow down +
> ↑ \[ua] arrowup u2191 vertical arrow up +
> ↕ \[va] arrowupdn u2195 T{ vertical arrow in both
> directions
> T} ⇐ \[lA] arrowdblleft u21D0 horizontal double arrow
> left
> ⇒ \[rA] arrowdblright u21D2 horizontal double arrow right
> ⇔ \[hA] arrowdblboth u21D4 T{ horizontal double arrow in

Nice! That's how it's supposed to be (in an environment that can display
Unicode).

> and this change being discussed is what gets me to proper UTF-8 rendering
> (although perhaps there is a better way to fix this)

That's what I claim.

When I run the groff command in different locales:
$ LC_ALL=de_DE.UTF-8 groff -Tutf8 -mandoc man7/groff_char.7 > ~/out1
$ LC_ALL=de_DE.ISO-8859-1 groff -Tutf8 -mandoc man7/groff_char.7 > ~/out2
the output files ~/out1 and ~/out2 are identical.

Therefore, once the option -Tutf8 has been passed to groff, the locale's
encoding is irrelevant.

When you run "man groff_char", these pieces of software are involved:
A) man
B) the gnulib parts included in 'man'
C) groff
D) the gnulib parts included in 'groff'

The experiment above shows that C) and D) don't need changes.

I believe the fix needs to be in A), not B).

It is likely that A) does a call to nl_langinfo(CODESET) or locale_charset(),
to decide which options to pass to groff and potentially whether to call
iconv. This is perfectly normal, because when the console / xterm / terminal
can only display ISO-8859-1 characters, it would be wrong if 'man' sent
arbitrary Unicode characters to the console.

So, the questions are:
1) How is it possible that on z/OS most of the ASCII-based software forms
an ISO-8859-1 environment, yet the UTF-8 encoded groff output displays
just fine?
2) How to teach 'man' about this particular environment?

Thanks for the detailed response. I will dig in.

Bruno

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

is there a way to get gmail to change it's default 😁

From:	Mike Fulton
Subject:	Re: add __MVS__ to list of systems that should default to UTF-8 in localcharset.c
Date:	Thu, 1 Jun 2023 10:04:08 -0700