bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Adding dot product operation to GNU Datamash


From: Tim Rice
Subject: Adding dot product operation to GNU Datamash
Date: Sat, 6 Aug 2022 01:30:22 +0000

Hey all,

I've been thinking about this for a while: it would be nice to have an 
operation which multiplies the corresponding records of two columns and returns 
the sum of these products. Aka the dot product or scalar product of the two 
columns.

At the moment, you could do something similar by combining GNU Datamash with 
GNU Awk:

```
$ awk '{print $1 * $2}' /tmp/data.txt | datamash sum 1
```

Or you could do it all in gawk if you want:

```
$ awk '{sum += $1 * $2} END{print sum}' /tmp/data.txt
```

But I think doing it all in GNU Datamash allows a more intuitive command:

```
$ datamash -W dotprod 1:2 < /tmp/data.txt
```

A proposed implementation is attached. Please let me know if you see any 
problems with it.

If this looks good, then it should be trivial to also add a weighted mean. That 
will just be like the dot product except for dividing the result by one of the 
column sums. (But which column should be preferred for that? Maybe need to pass 
an extra option?)

~ Tim

Attachment: dotprod.diff
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]