[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Parameters for gsl_cdf_fdist_Q for p-value to compare nested models
From: |
Stephan Lorenzen |
Subject: |
Parameters for gsl_cdf_fdist_Q for p-value to compare nested models |
Date: |
Thu, 22 Dec 2022 10:14:18 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 |
Dear list,
I want to compare how well different nested models fit my data, but I am
not sure how to choose the parameters, and the more I google the more
confused I am. Since tests on my real data gave way too small p-values,
I decided to do tests on random data. The p-value is supposed to tell me
how much better the full model fits the data, i.e. how much signal for
the additional parameters is hidden in the data. Since I use random data
(there is no signal at all), I would expect a uniform distribution
between 0 and 1 for the p-values if I compare full vs nested models.
I do 10000 runs with 20 random normal distributed X and Y values (using
gsl_ran_gaussian), equivalent to 20 data points. I do two fits:
M1: y = a1x + a0 -> params1=1 (a0 does not count), df1=20-1-1=18
M2: y = a2x^2 + a1x + a0 -> params2=2 (a0 does not count), df2=20-2-1=17
I then calculate both errors (sum of squared residuals) and calculate F:
F = ((err1-err2) / (df1-df2)) / (err2/df2)
and calculate the p-value using
p=gsl_cdf_fdist_Q(F, df1-df2, df2)
I would expect a uniform distribution between 0 and 1, but the
distribution is skewed and shows way more small values then big ones
(see attached file), stating that the full model is "better" in most
cases. Obviously, there is something wrong, so I have a couple of questions:
- is it correct that constant values (a0) do not count as parameters?
- is my calculation of the degrees of freedom correct?
- did I calculate F correctly?
- did I insert the right parameters in gsl_cdf_fdist_Q?
- is my assumption that I expect a uniform distribution of p-values
correct?
I would be very grateful for answers!
Best
Stephan
density.pdf
Description: Adobe PDF document
- Parameters for gsl_cdf_fdist_Q for p-value to compare nested models,
Stephan Lorenzen <=