[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: binocdf inaccuracy in Octave
From: |
Daniel J Sebald |
Subject: |
Re: binocdf inaccuracy in Octave |
Date: |
Mon, 08 Jul 2013 15:10:06 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16 |
On 07/08/2013 12:33 PM, Daniel J Sebald wrote:
On 07/08/2013 12:29 PM, Daniel J Sebald wrote:
On 07/08/2013 11:38 AM, Rik wrote:
7/8/13
Dr. Klein,
I think this is actually a much easier problem to solve that it first
appeared. In the the file binocdf.m the formula used to calculate the
CDF is
cdf(k) = 1 - betainc (p, tmp + 1, n - tmp);
According to Wikipedia
(http://en.wikipedia.org/wiki/Binomial_distribution) the CDF for the
binomial distribution is
\textstyle I_{1-p}(n - k, 1 + k)
or
I(1-p, n-k, 1+k)
So it appears that we simply have the arguments wrong to the betainc
function.
But it is also true that
\textstyle I_{x}(a, b) = I_{1-x}(b, a)
Oops, make that
I_{x}(a,b) = 1 - I_{1-x}(b,a)
Dan
http://en.wikipedia.org/wiki/Regularized_incomplete_beta_function#Incomplete_beta_function
In other words, the two expressions you are comparing are mathematically
equivalent. Where is the discrepancy then? Numerical issues?
We should be a little careful here. There was a discrepancy at one of
the extreme ends of the function. Your modification may have just moved
that discrepancy to the other end of the range, i.e., when the argument
is 1e-3 as opposed to 1-1e-3. And we aren't 100% sure which is correct
(but it is looking like Octave because of the strange string of zeros at
the front), Erlang or Octave. Octave is using a FORTRAN routine:
double
betainc (double x, double a, double b)
{
double retval;
F77_XFCN (xdbetai, XDBETAI, (x, a, b, retval));
return retval;
}
So, we should introduce some tests, if possible, that check results
inherent to Octave's results. There are some tests, which I haven't
explored, as part of the betainc involving "eps".
There doesn't seem to be any tests as part of binocdf that check for
accuracy, mainly just sanity of the inputs. One or more tests along the
lines of what Alex referred to should be useful. That is, compute the
CDF two ways, one from binocdf and one from cumulating the probability
mass function
p = 1-1e-3;
pval = binopdf ([0:50]', 50, p);
cumsum (pval) - binocdf ([0:50]', 50, p)
and check those results. Here is what I'm seeing:
p = 1-1e-3;
pval = binopdf ([0:50]', 50, p);
cumsum (pval) - binocdf ([0:50]', 50, p)
ans =
1.00000000000007e-150
4.99510000000035e-146
1.22260117600006e-141
1.95424813815765e-137
2.29399723360430e-133
2.10841676614637e-129
1.57977022596915e-125
9.92031009822021e-122
5.32697826475548e-118
2.48350747999872e-114
1.01724999177793e-110
3.69550684776946e-107
1.19987796084329e-103
3.50394897537906e-100
9.25151215525190e-97
2.21822759875234e-93
4.84771698248160e-90
9.68615422923810e-87
1.77409990314262e-83
2.98511415284120e-80
4.62253801912459e-77
6.59738446954312e-74
8.68836583529609e-71
1.05672286748147e-67
1.18770467184289e-64
1.23406745773931e-61
1.18550990718313e-58
1.05282245558334e-55
8.64033611769894e-53
6.54884440079011e-50
4.58011354682620e-47
2.95231633143266e-44
1.75142047732177e-41
9.54507616566026e-39
4.76856227562884e-36
2.17814373131045e-33
9.06845557842062e-31
3.42871128091703e-28
1.17213664214074e-25
3.60414974466429e-23
9.90534035957436e-21
2.41464425822922e-18
-3.79109725646771e-17
5.02325782207112e-17
-7.98300326864362e-18
5.36716049465165e-17
-1.89714998454670e-17
2.55465136891897e-18
1.04083408558608e-17
1.94289029309402e-16
4.44089209850063e-16
Rik, that string of zeros at the front of binocdf () has me wondering if
the issue here is that the FORTRAN routine computing betainc () is
single precision float and not double precision float. Note that the
binopdf.m script uses the following computations:
pdf(k) = exp (gammaln (n+1) - gammaln (x(k)+1) - gammaln (n-x(k)+1)
+ x(k)*log (p) + (n-x(k))*log (1-p));
Are these all double precision operations?
Dan