bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: there is a bug with UNIX command join


From: Robert Wolf
Subject: RE: there is a bug with UNIX command join
Date: Mon, 23 Jun 2003 10:47:07 -0400

Thanks for replying so quickly. 
I tried
$ join -t  \012  -v 2 j1 j2
$ join -t '\012' -v 2 j1 j2
$ join -t "\012" -v 2 j1 j2

All three versions are doing the same wrong thing, they are including the
'eee' line, which is the last line
of the file j1 and a middle line of j2, when it should not include this
'eee' line.

I also tried
$ comm -13 j1 j2

However it does the same wrong thing as the previous three, again including
the 'eee' line. I also suspect it might have something to do with the 'eee'
line being the last line of the first file.

-----------------------------------------------------------

I tried your suggestion:
$ NL=$(printf "\n")

$ echo "$NL" | od -o
0000000 000012
0000001

$ join -t "$NL" -v 2 j1 j2

But it did something weird, it eliminated the 'eee' line which is good but
then it replaced all the long sequence of spaces with single spaces? It
looks like the program treated the space as a field separator character and
ignored the line feed.

-----------------------------------------------------------

I tried one thing, I created two other test files with much shorter lines
and tried all four commands.
$ join -t  \012  -v 2 s1 s2
$ join -t '\012' -v 2 s1 s2
$ join -t "\012" -v 2 s1 s2
$ comm -13 s1 s2

And guess what, it? They all worked! I believe the problem is the length of
the lines in the files j1 and j2 which are almost 300 characters is too long
for these commands to handle.

It would be nice if these commands, i.e. join & comm, could handle much
longer lines, say a default of 4096 characters, and a new option to specify
a larger line size say up to 65535 characters. Another question is can these
programs handle files that 30 MB in size and long lines?


-----Original Message-----
From: address@hidden [mailto:address@hidden
Sent: June 20, 2003 10:49 PM
To: Robert Wolf
Cc: 'address@hidden'
Subject: Re: there is a bug with UNIX command join


Robert Wolf wrote:
> $ join -t \012 -v 2 j1 j2
> 
>  <<j1>>  <<j2>> 
> The output should be only the lines in j2 that do not exist in j1.

For one thing I am not convinced that the \012 will be doing what you
think it will be doing here.  Usually you need to handle quoted
characters like that specially with the shell.  Something like this.

  NL=$(printf "\n")
  join -t "$NL" -v 2 j1 j2

> Essentially I have two sorted files, and I just want the lines from the
2nd
> file that are not in the 1st file.

Hmm...  Perhaps you are really looking for 'comm -13 j1 j2' here?

Bob


reply via email to

[Prev in Thread] Current Thread [Next in Thread]