[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25455: uniq considers all the full-width punctuation and Japanese ka
From: |
Icenowy Zheng |
Subject: |
bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale |
Date: |
Mon, 16 Jan 2017 04:01:05 +0800 |
Problem:
When dealing lines with only a Chinese full-width punctuation or Japanese kana
and locale is zh_CN.UTF-8, uniq command will consider all the lines are the
same, and wrongly removed different punctuations.
Reproduce steps:
Run the following command:
```
printf "%s\n" , 。 : ¥ あ か ア カ a b c , . : $ | LC_ALL=zh_CN.UTF-8 uniq
```
Comments:
The printf command prints out
```
,
。
:
¥
あ
か
ア
カ
a
b
c
,
.
:
$
```
Every line is different.
However, after uniq command, it gives out
```
,
a
b
c
,
.
:
$
```
Under zh_TW.UTF-8 locale, the problems also happens; but under ja_JP.UTF-8 or C
it do not happen.
Version info:
```
$ uniq --version
uniq (GNU coreutils) 8.26
... ...
$ /lib/libc.so.6
GNU C Library (2.24-2_AOSC_OS) stable release version 2.24, by Roland McGrath
et al.
... ...
```
Architecture:
on x86_64 and armv7l architectures the test fails.
- bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale,
Icenowy Zheng <=