Re: Adding dot product operation to GNU Datamash

bug-datamash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Adding dot product operation to GNU Datamash

From:	Erik Auerswald
Subject:	Re: Adding dot product operation to GNU Datamash
Date:	Sat, 6 Aug 2022 19:57:28 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

Hi,

On 06.08.22 03:30, Tim Rice wrote:

I've been thinking about this for a while: it would be nice to have anoperation which multiplies the corresponding records of two columns andreturns the sum of these products. Aka the dot product or scalar productof the two columns.
At the moment, you could do something similar by combining GNU Datamashwith GNU Awk:
```
$ awk '{print $1 * $2}' /tmp/data.txt | datamash sum 1
```

Or you could do it all in gawk if you want:

```
$ awk '{sum += $1 * $2} END{print sum}' /tmp/data.txt
```

But I think doing it all in GNU Datamash allows a more intuitive command:

```
$ datamash -W dotprod 1:2 < /tmp/data.txt
```
A proposed implementation is attached. Please let me know if you see anyproblems with it.


I looked at the diff and did not see any obvious problems.  I do
not see a reason not to add that operation either.

If this looks good, then it should be trivial to also add a weightedmean. That will just be like the dot product except for dividing theresult by one of the column sums. (But which column should be preferredfor that? Maybe need to pass an extra option?)


It might suffice to always divide by the sum of the first column,
if the code keeps the order of the given fields.  I think it does,
but I did not verify this.

This would allow to use "weighted_mean 1:2" resp. "weighted_mean 2:1"
to divide by the sum of column 1 resp. 2.

("weighted_mean" is just a placeholder, of course, I just needed
some name to illustrate the idea.)

Br,
Erik

[Prev in Thread]

Current Thread

[Next in Thread]

Adding dot product operation to GNU Datamash, Tim Rice, 2022/08/05
- Re: Adding dot product operation to GNU Datamash, Erik Auerswald <=

Prev by Date: Re: Adding range support to groupby
Next by Date: Re: Adding range support to groupby
Previous by thread: Adding dot product operation to GNU Datamash
Next by thread: Change of random seed
Index(es):
- Date
- Thread