bug#25455: uniq considers all the full-width punctuation and Japanese ka

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25455: uniq considers all the full-width punctuation and Japanese ka

From:	Icenowy Zheng
Subject:	bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale
Date:	Mon, 16 Jan 2017 04:01:05 +0800

Problem:
When dealing lines with only a Chinese full-width punctuation or Japanese kana
and locale is zh_CN.UTF-8, uniq command will consider all the lines are the
same, and wrongly removed different punctuations.

Reproduce steps:

Run the following command:

```
printf "%s\n" ， 。 ： ￥ あ か ア カ a b c , . : $ | LC_ALL=zh_CN.UTF-8 uniq
```

Comments:
The printf command prints out
```
，
。
：
￥
あ
か
ア
カ
a
b
c
,
.
:
$
```

Every line is different.

However, after uniq command, it gives out
```
，
a
b
c
,
.
:
$
```

Under zh_TW.UTF-8 locale, the problems also happens; but under ja_JP.UTF-8 or C 
it do not happen.

Version info:
```
$ uniq --version
uniq (GNU coreutils) 8.26
... ...
$ /lib/libc.so.6 
GNU C Library (2.24-2_AOSC_OS) stable release version 2.24, by Roland McGrath 
et al.
... ...
```

Architecture:

on x86_64 and armv7l architectures the test fails.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale, Icenowy Zheng <=
- bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale, Mingye Wang (Arthur2e5), 2017/01/17
- bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale, Mike Frysinger, 2017/01/20

Prev by Date: bug#25448: RFC 3339 misdescribed in doc of date(1)
Next by Date: bug#25456: [PATCH] dircolors: Highlight WIM archives.
Previous by thread: bug#25448: RFC 3339 misdescribed in doc of date(1)
Next by thread: bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale
Index(es):
- Date
- Thread