[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Adding range support to groupby
From: |
Erik Auerswald |
Subject: |
Re: Adding range support to groupby |
Date: |
Sat, 6 Aug 2022 12:35:33 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 |
Hi Shawn and Tim,
On 05.08.22 23:06, Tim Rice wrote:
On Fri, Aug 05, 2022 at 02:42:12AM -0700, Shawn Wagner wrote:
Attached patch allows for usage like `groupby 1-4` instead of `groupby
1,2,3,4`. Thoughts before I commit it?
Cool idea. I had a couple of very minor nits like some of the column
alignment in the tests was off. It does make it easier to eyeball tests
when everything lines up just so. Also, c3 in the crosstab tests
shouldn't be changed, but instead there should be a new test there.
Then I thought that we should also test having multiple columns instead
of just two.
I've attached a patch which addresses those thoughts.
I think you just added tests in that diff. I did not look at the
details, but I like the addition of those tests. :-)
Let's wait for Erik to also weigh in before committing it. He has a good
eye for detail, and might notice additional things that will make it
better.
I have read the diff and had a cursory look at existing tests (with
"git grep ... tests/"). Skimming the current documentation, I did
not find a documented restriction regarding using field ranges for
"groupby" or "crosstab" (which would just need to be removed IMHO).
I like the functionality, and the diff looks OK. :-)
Please take the comments below into consideration as suggestions:
Since this affects "crosstab" aka "ct" too, that should be mentioned
in NEWS as well.
(Does this change affect further operations? I did not test, neither
did I audit the code.)
I concur with Tim with keeping the 'c3' test and adding a new one
based on 'c3' that only replaces "1,2" with "1-2".
Then there should be a test that verifies that "crosstab 1-3" results
in an error since crosstab requires exactly two fields. There are
tests with just one and four fields in tests/datamash-crosstab.pl,
I think it could be added near those tests.
I would like the new "invalid field range" tests for "groupby" to
be duplicated with "crosstab".
By the way, testing the range out on crosstab helped me notice another
minor bug. If the output has wide columns (eg when using unique),
crosstab's column alignment ends up all out of whack. I've added a note
to myself to fix that sometime. It might be difficult, though.
I did not look at this, but AFAIUI GNU Datamash does not align
columns, it just separates them with one TAB character.
A different program, e.g., column(1) (in Debian GNU/Linux from
package "bsdmainutils"), can be used to visually align output
columns.
Is there really an issue here?
Thanks,
Erik