help-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: diff 20 files that are mostly equal


From: Bob Proulx
Subject: Re: diff 20 files that are mostly equal
Date: Tue, 14 Apr 2009 11:26:02 -0500
User-agent: Mutt/1.5.18 (2008-05-17)

avilella wrote:
> I would like to compare ~20 files that are mostly the same, but some
> of them have 2-3 different lines in a couple of places. I can do a
> diff for every pair, but I bould like to have one representation for
> all files that is a consensus file then with extra tagged lines for
> the differences. Is there any tool that does that? What would people
> recommend?

I don't know of any tool that does that directly.  And I think
diff'ing every pair could generate a lot of messy output.

What I tend to do in those types of situations is to run md5sum (or
any of the *sum utilities) on the entire list of files.  Then sort by
the signature.  Files that are identical will have identical
signatures and will be grouped together.  Files that are different
will be listed apart from them.  Also the 'uniq -c' utility can count
and produce a count of identical.  Sort can then be applied to this
output and the files that have the most identical copies will be
identified and files with fewer instances identified.

  $ md5sum ./* | sort -k1,1
  118721e880107e6bac4d8b6f42c472d4  ./5
  118721e880107e6bac4d8b6f42c472d4  ./6
  29c450ee7a45cf7aa4e8ebe165925fd5  ./7
  3e234925eeb1b48960dcbf43050f4b23  ./1
  3e234925eeb1b48960dcbf43050f4b23  ./2
  3e234925eeb1b48960dcbf43050f4b23  ./3
  3e234925eeb1b48960dcbf43050f4b23  ./4

  $ md5sum ./* | sort -k1,1 | awk '{print$1}' | uniq -c
  2 118721e880107e6bac4d8b6f42c472d4
  1 29c450ee7a45cf7aa4e8ebe165925fd5
  4 3e234925eeb1b48960dcbf43050f4b23

  $ md5sum ./* | sort -k1,1 | awk '{print$1}' | uniq -c | sort -nr
  4 3e234925eeb1b48960dcbf43050f4b23
  2 118721e880107e6bac4d8b6f42c472d4
  1 29c450ee7a45cf7aa4e8ebe165925fd5

Perhaps something like that might be useful for you as well?

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]