bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New program: rand(1)


From: Tim Rice
Subject: Re: New program: rand(1)
Date: Sun, 21 Aug 2022 23:47:23 +0000

Hey Shawn,

Thanks for your input.

I do kind of feel like datamash should focus on working with existing data,
not generating new sets, and that this rand might be better suited as a
separate project, but not strongly enough to protest very much.

I thought about it. I figured there was precedent with programs like 
decorate(1) to implement non-core functionality in such an extra program. 
Unlike decorate, rand will make certain aspects of datamash(1) development 
better.

I can really see the argument both ways, so I'm open to more pressure to spin 
off rand into a separate project if people think that is best :)

We should get it right now, while we're still in the early part of the next release 
cycle. It's no big deal to revert the current implementation with a new commit that says 
something like "rand(1): Move this to a separate project.". Our decision now 
helps set a demarkation for the future, between which features will be seen as 
appropriate for the GNU Datmash project, and which will not.


Arguments for keeping rand(1) in GNU Datamash (which I will acronym as GD):

* Ensure GD ships with a way for developers to benchmark its performance and 
test some of its statistical functions like jarque.

* The core mission of GD is to provide fast and easy ways to munge data and generate 
statistical summaries from the CLI. People doing this kind of work will often also be 
after "fake" data. By including the extra functionality in GD, they don't need 
to install yet another package. GD becomes even more useful to people who want that 
functionality, not less useful to people who don't, and saves splitting effort between 
multiple projects.

* There is precedent for GD to not be just one program, but a suite of data tools. Two 
and a half years ago, GD went from one program to two programs. In five or ten years from 
now, GD may be not just two programs but half a dozen or more, the "coreutils of 
data". So long as these programs suit the general theme of data utilities that work 
in CLI pipelines, I think it is one possible future worth considering.


Arguments for spinning rand(1) off into a project called something like GNU 
Rand:

* We've already seen that rand(1) wants to encumber GD with an additional 
dependency, and perhaps should be seen as simply a CLI wrapper to a subset of 
GSL. This alone implies that rand is not a good fit for GD.

* Toggling GSL support will make GD more complicated to maintain. In general, 
each extra program will encumber future maintainers with additional 
responsibilities. I would rather not be resented by future generations for 
adding a program they would prefer not to support :)

* Perhaps GD should remain focused purely on processing existing data. Keep 
each GNU package as orthogonal to all the others as possible. This ensures 
everything remains lightweight and composable. This has not seemed to be a 
primary concern of other GNU projects, but I am not unsympathetic to the 
argument.


I will put a moratorium on rand(1) development for a bit to allow more time for 
feedback. If no clear consensus emerges, I'll make a decision after a couple of 
weeks. It may be a decision to spin off rand into a separate project, or it may 
be a decision to proceed with toggling GSL support as suggested by Erik.

~ Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]