bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese c

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese c

From:	Bernhard Voelker
Subject:	bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?)
Date:	Wed, 12 Jan 2022 13:25:17 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1

On 1/12/22 12:19, zendas via GNU coreutils Bug Reports wrote:
> I have considered dealing with this problem directly with three bytes 
> instead, but I have two doubts, I can correctly use wc -m to recognize the 
> bytes in the same environment (but cut can't?), and my script goal is to 
> recognize Chinese, will The probability of execution is higher on platforms 
> that support Chinese environment. In addition, the fixed three-byte approach 
> cannot handle the mixed content of full shape and half shape. I need a lot of 
> judgment and conversion, which will greatly increase the possibility of 
> errors.

As Bob wrote, some downstream distributions have multi-byte support in cut(1) 
for many years,
e.g. RHEL/Fedora and SUSE/openSUSE.

E.g. here on my openSUSE system:

  $ echo "你好啊" | LC_ALL=zh_CN.UTF-8 cut -c 1
  你

Have a nice day,
Berny

[Prev in Thread]

Current Thread

[Next in Thread]

bug#53145: "cut" can't segment Chinese characters correctly?, zendas, 2022/01/09
- bug#53145: "cut" can't segment Chinese characters correctly?, Bob Proulx, 2022/01/09
  - bug#53145: "cut" can't segment Chinese characters correctly?, zendas, 2022/01/09
    - bug#53145: 回覆: Re: bug#53145: "cut" can't segment Chinese characters correctly?, zendas, 2022/01/10
- Message not available
  - bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?), zendas, 2022/01/12
    - bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?), Bernhard Voelker <=

Prev by Date: bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?)
Next by Date: bug#53209: stat shows wrong, non-existing device number
Previous by thread: bug#53145: 回覆: bug#53145: Acknowledgement ("cut" can't segment Chinese characters correctly?)
Next by thread: bug#53209: stat shows wrong, non-existing device number
Index(es):
- Date
- Thread