bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25455: uniq considers all the full-width punctuation and Japanese ka


From: Icenowy Zheng
Subject: bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale
Date: Mon, 16 Jan 2017 04:01:05 +0800

Problem:
When dealing lines with only a Chinese full-width punctuation or Japanese kana
and locale is zh_CN.UTF-8, uniq command will consider all the lines are the
same, and wrongly removed different punctuations.

Reproduce steps:

Run the following command:

```
printf "%s\n" , 。 : ¥ あ か ア カ a b c , . : $ | LC_ALL=zh_CN.UTF-8 uniq
```

Comments:
The printf command prints out
```
,
。
:
¥
あ
か
ア
カ
a
b
c
,
.
:
$
```

Every line is different.

However, after uniq command, it gives out
```
,
a
b
c
,
.
:
$
```

Under zh_TW.UTF-8 locale, the problems also happens; but under ja_JP.UTF-8 or C 
it do not happen.

Version info:
```
$ uniq --version
uniq (GNU coreutils) 8.26
... ...
$ /lib/libc.so.6 
GNU C Library (2.24-2_AOSC_OS) stable release version 2.24, by Roland McGrath 
et al.
... ...
```

Architecture:

on x86_64 and armv7l architectures the test fails.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]