[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Suggestion: add the possibility to apply multiple operations to a si
From: |
Tomas Peitl |
Subject: |
Re: Suggestion: add the possibility to apply multiple operations to a single column (or multiple columns) |
Date: |
Mon, 7 Nov 2022 09:30:29 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 |
Hi Tim,
Thanks for the reply.
The main thing to be careful of with a "mean,max,count"-style
operation is how it would interact with groupby or crosstab. Eg I
wonder if "datamash groupby 1 mean,max,count 2" makes sense in any way.
Ranges like 1-2,4 could be less straightforward, especially when
combined with the former idea of providing multiple operations
simultaneously. When preparing a test for "mean,max,count 1-2,4",
should the test output columns like "mean_1, max_1, count_1, mean_2,
max_2, count_2, mean_4, max_4, count_4", or "mean_1, max_1, count_1,
mean_2, max_2, count_2, mean_4, max_4, count_4", or something else?
Good points, I didn't even realize you would have to make a decision
here. Perhaps it's more natural to think of 'mean,max' as a single
combined operation, i.e. the ordering mean 1 max 1 mean 2 max 2, but I
also originally had the other ordering in mind. But in any case, if this
is what you need and you type it out verbosely, you still have to make
that same decision, only here we would have to make a default decision
for every case. There could even be a command-line switch to toggle the
ordering.
Is there any chance you could provide a preliminary patch and tests
which would get the ball rolling? You could break it up into two
patches, one for adding column ranges, and one for "lambda-ing"
multiple operations over a column.
Perhaps, but can't make any promises either.
Cheers,
Tomas