bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Texinfo 7.0.93 pretest available


From: Eli Zaretskii
Subject: Re: Texinfo 7.0.93 pretest available
Date: Tue, 10 Oct 2023 14:55:09 +0300

> From: Gavin Smith <gavinsmith0123@gmail.com>
> Date: Mon, 9 Oct 2023 20:39:59 +0100
> Cc: Bruno Haible <bruno@clisp.org>, bug-texinfo@gnu.org
> 
> > IOW, unless the locale's codeset is UTF-8, any character that is not
> > printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
> > guessing that no one has ever tried to run the test suite in a
> > non-UTF-8 locale before?
> 
> It is supposed to attempt to force the locale to a UTF-8 locale.  You
> can see the code in xspara_init that attempts to change the locale.  There
> is also a comment before xspara_add_text:
> 
>   "This function relies on there being a UTF-8 locale in LC_CTYPE for
>   mbrtowc to work correctly."

You cannot force MS-Windows into using the UTF-8 locale (with the
possible exception of very recent Windows versions, which AFAIK still
don't support UTF-8 in full).

You also cannot force an arbitrary Posix system into using UTF-8,
because such a locale might not be installed.

> For MS-Windows there is the w32_setlocale function that may use something
> different:
> 
>   /* Switch to the Windows U.S. English locale with its default
>      codeset.  We will handle the non-ASCII text ourselves, so the
>      codeset is unimportant, and Windows doesn't support UTF-8 as the
>      codeset anyway.  */
>   return setlocale (category, "ENU");
> 
> mbrtowc has its own override which handle UTF-8.
> 
> As far as this relates to wcwidth, there used to be an MS-Windows specific
> stub implementation of this, removed in commit 5a66bc49ac032 (Patrice Dumas,
> 2022-08-19) which added a gnulib implementation of wcwidth:
> 
> diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
> index 93924a623c..bf4ef91650 100644
> --- a/tp/Texinfo/XS/xspara.c
> +++ b/tp/Texinfo/XS/xspara.c
> @@ -206,13 +206,6 @@ iswspace (wint_t wc)
>    return 0;
>  }
>  
> -/* FIXME: Provide a real implementation.  */
> -int
> -wcwidth (const wchar_t wc)
> -{
> -  return wc == 0 ? 0 : 1;
> -}
> -
>  int
>  iswupper (wint_t wi)
>  {
> 
> 
> If this simple stub is preferable to the Gnulib implementation for
> MS-Windows, (e.g. it makes the tests pass) we could re-add it again.

We can do that, but I think we should first explore a better
alternative: use UTF-8 functions everywhere, without relying on the
locale-aware functions of libc, such as wcwidth.  For example, instead
of wcwidth, we could use uc_width.

Is it feasible to use UTF-8 in texi2any disregarding the locale, and
use libunistring or something similar for the few functions we need in
the extensions that are required to deal with non-ASCII characters?
If we can do that, it will work on all systems, including Windows.
(This is basically what Emacs does, but it does that on a much greater
scale, which is unnecessary in texi2any.)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]