[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#54124: fmt inserts garbage in certain cases?
From: |
Pádraig Brady |
Subject: |
bug#54124: fmt inserts garbage in certain cases? |
Date: |
Wed, 23 Feb 2022 17:55:49 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Thunderbird/97.0 |
On 23/02/2022 10:58, JD wrote:
Hi!
I have fmt from coreutils 8.32.1 installed via MacPorts.
If I run the following command: `echo х х х х х х х х х х х х х х х х х х х х х
х х х х х | gfmt -sw 10` (which is just echoing 26 Cyrillic 'х' ('kha')
letters), I get the following results:
https://i.imgur.com/yRx7uuz.png (iTerm2)
https://i.imgur.com/7oQ0UPz.png (iTerm2 if passed via `more`)
https://i.imgur.com/UlLrEMy.png (Alacritty)
And if I delete just two 'х' letters, like this: `echo х х х х х х х х х х х х
х х х х х х х х х х х х | gfmt -sw 10`, evertyhitng shows just fine:
https://i.imgur.com/DwuWxyx.png
Would be grateful for any advice :)
The issue here is that (on macOS 10.15.7 at least),
isspace(0x85) returns true for UTF-8 locales
(but not for "C" or "iso8859-1" locales).
BTW iscntrl() returns true for 0x85 on all non C locales
on both Linux and macOS.
Now gnulib says wrt isspace() that:
"This function's behaviour depends on the locale, but does not support
the multibyte characters that occur in strings in locales with
@code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales)."
I think isspace(x85) returning true on macOS is a bug,
but we should probably avoid isspace() in fmt altogether
given it's inconsistency with multibyte locales.
The attached uses c_isspace() instead.
cheers,
Pádraig
fmt-utf8-macOS.patch
Description: Text Data