Hi Shawn and Tim,
On 05.08.22 23:06, Tim Rice wrote:
> On Fri, Aug 05, 2022 at 02:42:12AM -0700, Shawn Wagner wrote:
>>
>> Attached patch allows for usage like `groupby 1-4` instead of `groupby
>> 1,2,3,4`. Thoughts before I commit it?
>
> Cool idea. I had a couple of very minor nits like some of the column
> alignment in the tests was off. It does make it easier to eyeball tests
> when everything lines up just so. Also, c3 in the crosstab tests
> shouldn't be changed, but instead there should be a new test there.
>
> Then I thought that we should also test having multiple columns instead
> of just two.
>
> I've attached a patch which addresses those thoughts.
I think you just added tests in that diff. I did not look at the
details, but I like the addition of those tests. :-)
> Let's wait for Erik to also weigh in before committing it. He has a good
> eye for detail, and might notice additional things that will make it
> better.
I have read the diff and had a cursory look at existing tests (with
"git grep ... tests/"). Skimming the current documentation, I did
not find a documented restriction regarding using field ranges for
"groupby" or "crosstab" (which would just need to be removed IMHO).
I like the functionality, and the diff looks OK. :-)
Please take the comments below into consideration as suggestions:
Since this affects "crosstab" aka "ct" too, that should be mentioned
in NEWS as well.
crosstab and groupby use the same function for getting columns; that's the only reason it's affected. Given it only works with two columns I didn't see a point to mentioning it in the documentation, as doing so might make people think it's supposed to work with more.
(Does this change affect further operations? I did not test, neither
did I audit the code.)
I don't believe so.
I concur with Tim with keeping the 'c3' test and adding a new one
based on 'c3' that only replaces "1,2" with "1-2".
Then there should be a test that verifies that "crosstab 1-3" results
in an error since crosstab requires exactly two fields. There are
tests with just one and four fields in tests/datamash-crosstab.pl,
I think it could be added near those tests.
Sounds good.
I would like the new "invalid field range" tests for "groupby" to
be duplicated with "crosstab".
> By the way, testing the range out on crosstab helped me notice another
> minor bug. If the output has wide columns (eg when using unique),
> crosstab's column alignment ends up all out of whack. I've added a note
> to myself to fix that sometime. It might be difficult, though.
I did not look at this, but AFAIUI GNU Datamash does not align
columns, it just separates them with one TAB character.
A different program, e.g., column(1) (in Debian GNU/Linux from
package "bsdmainutils"), can be used to visually align output
columns.
Is there really an issue here?
Thanks,
Erik