[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sort and delete duplcate messages
From: |
Ralph Corderoy |
Subject: |
Re: Sort and delete duplcate messages |
Date: |
Mon, 04 May 2020 09:55:39 +0100 |
Hi,
Ken wrote:
> > I know that 'sortm -textfield Subject' will sort messages accoring
> > to the subject field. Having run that command, is there a way to
> > then delete the first duplicate of each message in the list such
> > that if 1 and 2 are duplicates and 6 and 7 are duplicates you would
> > delete messages 2 and 7 leaving 1 and 6?
>
> I want to say you could do something with piping the output of scan
> into "uniq -d -f <num>". Might require a custom scan format, but that
> seems relatively simple.
>
> Hm, a quick test:
>
> % scan -format '%(msg) %{subject}' | uniq -d -f 1
>
> suggests that it prints the first one, not later ones, so that isn't
> exactly what you want. Might be a good starting point, though? You
> could probably do something with uniq -c and pipe that to an awk
> script that did what you wanted.
awk's probably easiest, after deciding what counts as an equivalent
subject field.
$ ls
1 2 3 4
$ sed -n l *
subject: foo bar$
subject: foo$
bar$
subject: xyzzy $
subject: fo=?utf-8?Q?=6f?= bar$
$
$ scan -width 0 -format '%(decode{subject}):%{subject}:%(putlit{subject}):'
+.
foo bar:foo bar: foo bar:
foo bar:foo bar: foo
bar:
xyzzy:xyzzy: xyzzy:
foo bar:fo=?utf-8?Q?=6f?= bar: fo=?utf-8?Q?=6f?= bar:
$
$ scan -width 0 -format '%(msg) %(decode{subject})' +.
1 foo bar
2 foo bar
3 xyzzy
4 foo bar
$
$ scan -width 0 -format '%(msg) %(decode{subject})' +. |
> awk '{m=$1; sub(/[^ ]* /, "", $0)} NR>1 && $0==l {print m} {l=$0}'
2
$
--
Cheers, Ralph.