bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: add __MVS__ to list of systems that should default to UTF-8 in local


From: Mike Fulton
Subject: Re: add __MVS__ to list of systems that should default to UTF-8 in localcharset.c
Date: Thu, 1 Jun 2023 10:04:08 -0700



On Thu, Jun 1, 2023 at 8:23 AM Bruno Haible <bruno@clisp.org> wrote:
Hi Mike,

> It looks right but I do see 3 warnings:
>
> :troff: man7/groff_char.7:1051: warning: can't find special character 'bs'
> troff: man7/groff_char.7:1192: warning: can't find special character
> 'radicalex'
> troff: man7/groff_char.7:1195: warning: can't find special character
> 'sqrtex'

I get these warnings too, on a GNU system; so, you can ignore them.

> For example, in the Arrows section I see:
>
>        Arrows
>
>        l l l l lx.  Output    Input     PostScript     Unicode   Notes _
>        ←    \[<-]     arrowleft u2190     horizontal arrow left +
>        →    \[->]     arrowright     u2192     horizontal arrow right +
>        ↔    \[<>]     arrowboth u2194     T{ horizontal arrow in both direc‐
>        tions T} ↓    \[da]     arrowdown u2193     vertical arrow down +
>        ↑    \[ua]     arrowup   u2191     vertical arrow up +
>        ↕    \[va]     arrowupdn u2195     T{ vertical arrow in both
> directions
>        T} ⇐    \[lA]     arrowdblleft   u21D0     horizontal double arrow
> left
>        ⇒    \[rA]     arrowdblright  u21D2     horizontal double arrow right
>        ⇔    \[hA]     arrowdblboth   u21D4     T{ horizontal double arrow in

Nice! That's how it's supposed to be (in an environment that can display
Unicode).

> and this change being discussed is what gets me to proper UTF-8 rendering
> (although perhaps there is a better way to fix this)

That's what I claim.

When I run the groff command in different locales:
  $ LC_ALL=de_DE.UTF-8 groff -Tutf8 -mandoc man7/groff_char.7 > ~/out1
  $ LC_ALL=de_DE.ISO-8859-1 groff -Tutf8 -mandoc man7/groff_char.7 > ~/out2
the output files ~/out1 and ~/out2 are identical.

Therefore, once the option -Tutf8 has been passed to groff, the locale's
encoding is irrelevant.

When you run "man groff_char", these pieces of software are involved:
  A) man
  B) the gnulib parts included in 'man'
  C) groff
  D) the gnulib parts included in 'groff'

The experiment above shows that C) and D) don't need changes.

I believe the fix needs to be in A), not B).

It is likely that A) does a call to nl_langinfo(CODESET) or locale_charset(),
to decide which options to pass to groff and potentially whether to call
iconv. This is perfectly normal, because when the console / xterm / terminal
can only display ISO-8859-1 characters, it would be wrong if 'man' sent
arbitrary Unicode characters to the console.

So, the questions are:
  1) How is it possible that on z/OS most of the ASCII-based software forms
     an ISO-8859-1 environment, yet the UTF-8 encoded groff output displays
     just fine?
  2) How to teach 'man' about this particular environment?
Thanks for the detailed response. I will dig in. 

Bruno

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
is there a way to get gmail to change it's default 😁  

reply via email to

[Prev in Thread] Current Thread [Next in Thread]