[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Diff: obtain only the different lines of the newest file
From: |
Davide Brini |
Subject: |
Re: Diff: obtain only the different lines of the newest file |
Date: |
Thu, 12 Aug 2010 09:56:41 +0100 |
User-agent: |
KMail/1.13.5 (Linux/2.6.34-gentoo-r1; KDE/4.4.5; x86_64; ; ) |
On Thursday 12 Aug 2010 08:44:35 Bob Proulx wrote:
> Kimahri Ronso wrote:
> > My question is about the diff command.
> >
> > I have 2 files containing almost the same information with around 70.000
> > records.
> >
> > What I would like to know is if there is a possibility to obtain only the
> > different lines from the second file without anything else.
>
> Instead of using 'diff' you might find 'comm' more the right tool
> there.
>
> Compare sorted files FILE1 and FILE2 line by line.
>
> With no options, produce three-column output. Column one
> contains lines unique to FILE1, column two contains lines unique to
> FILE2, and column three contains lines common to both files.
>
> -1 suppress lines unique to FILE1
>
> -2 suppress lines unique to FILE2
>
> -3 suppress lines that appear in both files
>
> Here is an example. Given:
>
> $ cat /tmp/a
> one
> two
> three
>
> $ cat /tmp/b
> one
> two
> three
> four
> five
> six
>
> Then:
>
> $ comm -13 /tmp/a /tmp/b
> four
> five
> six
I think this works by accident, since comm needs sorted files.
I get this:
$ comm -13 /tmp/a /tmp/b
four
comm: file 2 is not in sorted order
five
six
> > I just need to know the content of the changed lines in the newest file.
>
> For that you would need to determine the newest file first and then
> handle it appropriately. Something like this, untested:
>
> if [ $(stat --format %Y /tmp/a) -lt $(stat --format %Y /tmp/b) ]; then
> comm -13 /tmp/a /tmp/b
> else
> comm -13 /tmp/b /tmp/a
> fi
>
> The stat with %Y emits the modification time as an integer number of
> seconds and that is compared to determine the newest file.
Here's an awk solution, assuming the newer file has previously been determined
(for example with stat as you suggest):
awk 'NR==FNR{a[$0];next} !($0 in a)' oldfile newfile
That prints lines in "newfile" that are not in "oldfile".
--
D.