bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sed bug: sed cannot handle some chinese characters


From: Gang Chen
Subject: sed bug: sed cannot handle some chinese characters
Date: Sun, 07 Jan 2007 14:42:39 +0800
User-agent: Icedove 1.5.0.9 (X11/20061220)

Hi

I found a bug that sed cannot handle some chinese symbol characters.
For example, text file "foo" following content ",/123"
(the first two characters are symbol characters in GBK encoding)

Here's the HEX dump for foo
$ hexdump -Cv foo
00000000 a3 ac a3 af 31 32 33 0a |....123.|
00000008

while the sed "s/[^0-9]//g" cannot filter out those characters
$ sed "s/[^0-9]//g" foo | hexdump -Cv
00000000 a3 ac a3 af 31 32 33 0a |....123.|
00000008

$ sed "s/[0-9]//g" foo | hexdump -Cv
00000000 a3 ac a3 af 0a |.....|
00000005

$ sed --version
GNU sed version 4.1.5
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.

Note: This issue will only occurs in chinese locale (e.g. LC_ALL=zh_CN.gbk)

Gang Chen
__________________________________________________
赶快注册雅虎超大容量免费邮箱?
http://cn.mail.yahoo.com





reply via email to

[Prev in Thread] Current Thread [Next in Thread]