[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: New program: rand(1)
From: |
Tim Rice |
Subject: |
Re: New program: rand(1) |
Date: |
Sat, 20 Aug 2022 23:16:04 +0000 |
Currently implemented are unif (continuous Uniform distribution), exp
(Exponential distribution), and norm (Normal distribution). I expect to
implement additional distributions in the coming weeks.
Hmm, this may require more thought.
Several of the probability distributions I was thinking of require the
incomplete beta function to simulate efficiently. I thought this could be
easily copy-pasted from other free software such as the GNU Scientific Library
(GSL) or R. Now that I've actually taken a stab at it, I feel less confident.
I found the R implementation to be inscrutable. It seems to be inspired by an
algorithm that was published in the Communications of the ACM in the 1960s, without
much discussion about which mathematical underpinnings it relies on. The algorithm
pre-dated the first edition of Abromowitz & Stegun by a year.
The GSL implementation is clearer, because it clearly relies on the
continued-fraction expansion from Abromiwitz & Stegun 26.5. However, there is a
bit of a rabbit-hole of one function depending on another and another ad nauseum,
so you can't just copy-paste one file. There are at least a thousand lines of code
to curate.
I see a few options, which I haven't decided yet, so feedback would be welcome:
* Limit the scope of rand(1) to just what is currently implemented? This avoids
baroquities like the incomplete beta function altogether, at a cost to feature
completeness. It's easy to do, but unsatisfying.
* Make GNU Datamash depend on GSL? This is also a fairly easy option. There are other
benefits too: GSL comes with a broad suite of functionality, which may be useful in
future GNU Datamash development. However, it is a fairly drastic change that will require
adjustment both by packagers and developers. I am conscious of the advice in the GNU
Coding Standards: "Do not induce new dependencies on other software lightly."
* Continue the work of integrating copy-pasted code from GSL into GNU Datamash?
Aside from my immediate exasperation with this effort, there is an additional
cost that future improvements to the external code won't necessarily make their
way into our copy. Furthermore, as we continue implementing new features for
GNU Datamash, we may see more and more copy-pasting from GSL going on. The
longer we wait before making GSL a dependency, the more effort may be required
down the track.
* Implement something from scratch? I am not completely averse to this, but it
increases duplication of effort between different GNU projects. I am also
worried that with fewer eyes on GNU Datamash than GSL, I will introduce bugs
that are not an issue in other implementations.
I guess the first and biggest decision is whether to make GSL a dependency. Let
me know whether you think it would be a good idea or bad idea.
~ Tim