bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

uniq prints invalid unique lines multiple times


From: Reuben Thomas
Subject: uniq prints invalid unique lines multiple times
Date: Mon, 23 Feb 2004 21:46:27 +0100 (CET)

When I run uniq from coreutils 5.0 with LANG=en_GB.UTF-8 on a glibc 2.3.2
system on a file which (I think) is not valid UTF-8, I get a confusing
result: the two lines in the file are identical, but uniq prints them
both, and returns an exit code of 0. If I run

LANG=C uniq <file>

I get the expected single line of output. What I expect when I run with
LANG=en_GB.UTF-8 is either for uniq to return an error (because the file
is not valid text), or to print one single line (if it's being lenient).

The only way I might be wrong is if the file can be interpreted as a UTF-8
file with two non-identical lines, but I don't think it can.

I attach the relevant file, and display it below (the \200 is a literal
top-bit-set byte, value octal 0200). The file ends with a linefeed, in
case you're wondering!

ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506
ushort f(char,double,char,double):('a',0.2,'\200',0.4)->65506

-- 
http://www.mupsych.org/~rrt/ | The only person worth beating is yourself

Attachment: minitests.output.i686-pc-linuxlibc6
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]