bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#54124: fmt inserts garbage in certain cases?


From: Pádraig Brady
Subject: bug#54124: fmt inserts garbage in certain cases?
Date: Wed, 23 Feb 2022 17:55:49 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:97.0) Gecko/20100101 Thunderbird/97.0

On 23/02/2022 10:58, JD wrote:
Hi!

I have fmt from coreutils 8.32.1 installed via MacPorts.

If I run the following command: `echo х х х х х х х х х х х х х х х х х х х х х 
х х х х х | gfmt -sw 10` (which is just echoing 26 Cyrillic 'х' ('kha') 
letters), I get the following results:

https://i.imgur.com/yRx7uuz.png (iTerm2)
https://i.imgur.com/7oQ0UPz.png (iTerm2 if passed via `more`)
https://i.imgur.com/UlLrEMy.png (Alacritty)

And if I delete just two 'х' letters, like this: `echo х х х х х х х х х х х х 
х х х х х х х х х х х х | gfmt -sw 10`, evertyhitng shows just fine: 
https://i.imgur.com/DwuWxyx.png

Would be grateful for any advice :)

The issue here is that (on macOS 10.15.7 at least),
isspace(0x85) returns true for UTF-8 locales
(but not for "C" or "iso8859-1" locales).
BTW iscntrl() returns true for 0x85 on all non C locales
on both Linux and macOS.

Now gnulib says wrt isspace() that:

"This function's behaviour depends on the locale, but does not support
the multibyte characters that occur in strings in locales with
@code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales)."

I think isspace(x85) returning true on macOS is a bug,
but we should probably avoid isspace() in fmt altogether
given it's inconsistency with multibyte locales.
The attached uses c_isspace() instead.

cheers,
Pádraig

Attachment: fmt-utf8-macOS.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]