[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: New program: rand(1)
From: |
Tim Rice |
Subject: |
Re: New program: rand(1) |
Date: |
Sun, 21 Aug 2022 23:47:23 +0000 |
Hey Shawn,
Thanks for your input.
I do kind of feel like datamash should focus on working with existing data,
not generating new sets, and that this rand might be better suited as a
separate project, but not strongly enough to protest very much.
I thought about it. I figured there was precedent with programs like
decorate(1) to implement non-core functionality in such an extra program.
Unlike decorate, rand will make certain aspects of datamash(1) development
better.
I can really see the argument both ways, so I'm open to more pressure to spin
off rand into a separate project if people think that is best :)
We should get it right now, while we're still in the early part of the next release
cycle. It's no big deal to revert the current implementation with a new commit that says
something like "rand(1): Move this to a separate project.". Our decision now
helps set a demarkation for the future, between which features will be seen as
appropriate for the GNU Datmash project, and which will not.
Arguments for keeping rand(1) in GNU Datamash (which I will acronym as GD):
* Ensure GD ships with a way for developers to benchmark its performance and
test some of its statistical functions like jarque.
* The core mission of GD is to provide fast and easy ways to munge data and generate
statistical summaries from the CLI. People doing this kind of work will often also be
after "fake" data. By including the extra functionality in GD, they don't need
to install yet another package. GD becomes even more useful to people who want that
functionality, not less useful to people who don't, and saves splitting effort between
multiple projects.
* There is precedent for GD to not be just one program, but a suite of data tools. Two
and a half years ago, GD went from one program to two programs. In five or ten years from
now, GD may be not just two programs but half a dozen or more, the "coreutils of
data". So long as these programs suit the general theme of data utilities that work
in CLI pipelines, I think it is one possible future worth considering.
Arguments for spinning rand(1) off into a project called something like GNU
Rand:
* We've already seen that rand(1) wants to encumber GD with an additional
dependency, and perhaps should be seen as simply a CLI wrapper to a subset of
GSL. This alone implies that rand is not a good fit for GD.
* Toggling GSL support will make GD more complicated to maintain. In general,
each extra program will encumber future maintainers with additional
responsibilities. I would rather not be resented by future generations for
adding a program they would prefer not to support :)
* Perhaps GD should remain focused purely on processing existing data. Keep
each GNU package as orthogonal to all the others as possible. This ensures
everything remains lightweight and composable. This has not seemed to be a
primary concern of other GNU projects, but I am not unsympathetic to the
argument.
I will put a moratorium on rand(1) development for a bit to allow more time for
feedback. If no clear consensus emerges, I'll make a decision after a couple of
weeks. It may be a decision to spin off rand into a separate project, or it may
be a decision to proceed with toggling GSL support as suggested by Erik.
~ Tim