bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: c32width gives incorrect return values in C locale


From: Gavin Smith
Subject: Re: c32width gives incorrect return values in C locale
Date: Sat, 11 Nov 2023 22:15:39 +0000

On Sat, Nov 11, 2023 at 09:06:41PM +0100, Bruno Haible wrote:
> [CCing bug-gnulib]
> Indeed, the c32* functions by design work only on those Unicode characters
> that can be represented as multibyte sequences in the current locale.
> 
> I'll document this better in the Gnulib manual.
> 
> Since you want texinfo to work on UTF-8 encoded text with characters outside
> the repertoire of the current locale, you'll need the libunistring functions,
> documented in
> <https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html>.
> Namely, replace c32width with uc_width.

Thanks, that seems to work perfectly.

I also changed c32isupper to uc_is_upper.  The gnulib manual stated
(node "isupper"):

  ‘c32isupper’
       This function operates in a locale dependent way, on 32-bit wide
       characters.  In order to use it, you first have to convert from
       multibyte to 32-bit wide characters, using the ‘mbrtoc32’ function.
       It is provided by the Gnulib module ‘c32isupper’.
  
  ...
  
  ‘uc_is_upper’
       This function operates in a locale independent way, on Unicode
       characters.  It is provided by the Gnulib module
       ‘unictype/ctype-upper’.

- and we wanted the "locale independent way".

I did not understand why uc_width was said to be "locale dependent":

  "These functions are locale dependent."

- from 
<https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html#index-uc_005fwidth>.

I also don't understand the purpose of the "encoding" argument -- can this
always be "UTF-8"?

I'm also unclear on the exact relationship between the types char32_t,
ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
but u8_mbtouc writes to a char32_t variable.  In the code I committed,
I used a cast to ucs4_t when calling uc_width.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]