bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese c


From: Bernhard Voelker
Subject: bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?)
Date: Wed, 12 Jan 2022 13:25:17 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1

On 1/12/22 12:19, zendas via GNU coreutils Bug Reports wrote:
> I have considered dealing with this problem directly with three bytes 
> instead, but I have two doubts, I can correctly use wc -m to recognize the 
> bytes in the same environment (but cut can't?), and my script goal is to 
> recognize Chinese, will The probability of execution is higher on platforms 
> that support Chinese environment. In addition, the fixed three-byte approach 
> cannot handle the mixed content of full shape and half shape. I need a lot of 
> judgment and conversion, which will greatly increase the possibility of 
> errors.

As Bob wrote, some downstream distributions have multi-byte support in cut(1) 
for many years,
e.g. RHEL/Fedora and SUSE/openSUSE.

E.g. here on my openSUSE system:

  $ echo "你好啊" | LC_ALL=zh_CN.UTF-8 cut -c 1
  你

Have a nice day,
Berny





reply via email to

[Prev in Thread] Current Thread [Next in Thread]