[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Basic calculation mistakes (e.g. mean/median)
From: |
Andreas Sommer |
Subject: |
Re: Basic calculation mistakes (e.g. mean/median) |
Date: |
Wed, 11 Nov 2020 08:07:44 +0100 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 |
On 2020-11-10 14:48, Brandon Invergo wrote:
Andreas Sommer writes:
$ seq 1 3 | datamash -H mean 1 median 1
mean(1) median(1)
2.5 2.5
$ seq 1 3 | datamash -R 5 -H mean 1 median 1
mean(1) median(1)
2.50000 2.50000
---
$ seq 1 4 | datamash -H -R 2 mean 1 median 1
mean(1) median(1)
3.00 3.00
Until that gets fixed, it means I can't trust the tool :(
All of those results are correct. The -H option is synonymous with
--header-in and --header-out, so the first row (containing the value 1)
is being treated as a header row not a data row.
Well that explains a lot. I have strongly expected that `-H` would print
headers without side effects. Hiding `--header-out` in a long option seems
strange. Also other Unix-y tools often use uppercase as negation, e.g. `zfs
list -H` = without printing column headers.
Anyway, I have the solution now and the developers can take this as wish to disambiguate the
short options. I can guess that you don't want to change this parameter, but the
documentation should clearly hint at it. The website (e.g.
https://www.gnu.org/software/datamash/examples/) typically first shows an example `seq ... |
datamash [without -H]` and in the next paragraph `<somefile datamash -H ...` – looking at
the documented example output, a reader like me might think that `-H` means "print
headers".
Thanks a lot!
-Andreas