bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Basic calculation mistakes (e.g. mean/median)


From: Andreas Sommer
Subject: Re: Basic calculation mistakes (e.g. mean/median)
Date: Wed, 11 Nov 2020 08:07:44 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.4.0


On 2020-11-10 14:48, Brandon Invergo wrote:

Andreas Sommer writes:

$ seq 1 3 | datamash -H mean 1 median 1
mean(1) median(1)
2.5     2.5

$ seq 1 3 | datamash -R 5 -H mean 1 median 1
mean(1) median(1)
2.50000 2.50000

---

$ seq 1 4 | datamash -H -R 2 mean 1 median 1
mean(1) median(1)
3.00    3.00

Until that gets fixed, it means I can't trust the tool :(

All of those results are correct.  The -H option is synonymous with
--header-in and --header-out, so the first row (containing the value 1)
is being treated as a header row not a data row.


Well that explains a lot. I have strongly expected that `-H` would print 
headers without side effects. Hiding `--header-out` in a long option seems 
strange. Also other Unix-y tools often use uppercase as negation, e.g. `zfs 
list -H` = without printing column headers.

Anyway, I have the solution now and the developers can take this as wish to disambiguate the 
short options. I can guess that you don't want to change this parameter, but the 
documentation should clearly hint at it. The website (e.g. 
https://www.gnu.org/software/datamash/examples/) typically first shows an example `seq ... | 
datamash [without -H]` and in the next paragraph `<somefile datamash -H ...` – looking at 
the documented example output, a reader like me might think that `-H` means "print 
headers".

Thanks a lot!
-Andreas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]