[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regression, and missing data
From: |
John Darrington |
Subject: |
Re: regression, and missing data |
Date: |
Tue, 6 Mar 2012 11:46:27 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
I would be very grateful if you would open a new bug report at
http://savannah.gnu.org/bugs/?group=pspp and copy in the information below
(and any other which you think is relevant).
It sounds as if this is a rather serious problem, so please mark it as
Severity: Major
Thanks reporting this problem.
John
On Tue, Mar 06, 2012 at 12:06:44AM -0500, Renan Levine wrote:
> Hi-
>
> This appears to be a bug in the PSPP regression routine with data with a
> large amount of missing values!
>
> I recently noticed some small discrepancies between simple bivariate
> regression results between IBM SPSS, STATA and PSPP. Until Prof.
> Shackman's email, I hadn't realized that the discrepancies only occur
> when there are many missing values. I was just confused...
>
> Sadly, I also find problems when running linear regressions using PSPP
> on data with missing values. I wish I knew what was causing the problem.
>
> So, using Dropbox, I wanted to make available some data which seems to
> illustrate the issue.
>
> Using psppire.exe 0.7.9-gab8ce2 on Windows AND psppire 0.7.8 on
> LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS
> and STATA on the same dataset, but does not calculate identical b
> coefficients when running bivariate or multivariate regressions.
>
> I created the following public opinion survey data files consisting of
> three variables from the 2004 Canadian Election Study which I recoded
> and declared certain values to be missing:
> http://dl.dropbox.com/u/35198072/ces2004-regtest.sav
> <http://www.queensu.ca/cora/ces.html> has many observations with missing
> values.
> http://dl.dropbox.com/u/35198072/ces2004-regtest2.sav has the same three
> variables, but I dropped all of the cases with missing values.
>
> This is the syntax file used to run descriptive statistics and three
> regression analyses.
> http://dl.dropbox.com/u/35198072/regression-tests.sps
>
> PSPP generates these regression results and descriptive statistics with
> missing values:
> http://dl.dropbox.com/u/35198072/regression-test-pspp1.html
> PSPP generates these regression results and descriptive statistics using
> the data without any missing values:
> http://dl.dropbox.com/u/35198072/regression-test-pspp2.html
>
> Here is the STATA output on the same output (.log is a text file - email
> me if you have a problem opening it). The first three regressions should
> match the output in regression-test-pspp1.html
> They are close, but not close enough... The bottom three regressions use
> the data with no missing values and these DO match PSPP's output (in
> regression-test-pspp2.html).
> http://dl.dropbox.com/u/35198072/regression-test-stata.log
>
> I also ran the data on SPSS and found results consistent with STATA.
> There did not seem to be any problems with Pearson's Chi-Square or
> Kendall's Tau-B when running a crosstab on the data with the missing
> values.
>
> I am sorry I don't know what has gone wrong, so I am making available
> this data in hopes someone might figure out where there is a mistake. I
> caution other users running regression on PSPP.
>
> Yours,
> Renan
>
> On 04-Mar-12 11:37 PM, Gene Shackman wrote:
>> Hi
>>
>> I'm using the windows version, psppire.exe 0.7.8-g997322, that I
>> downloaded from
>> http://www.gnu.org/software/pspp/get.html
>> I'm using windows vista, home version.
>>
>> My question is about linear regression. If I use data that has no
>> missing values, then PSPP regression seems to work fine. I compared
>> the results with other packages and got the same results, see
>> http://gsociology.icaap.org/methods/comparing_freestaprograms.html
>> However, if I use data that does have missing values, I get results
>> that are different from other programs. See the results from other
>> programs here
>> http://gsociology.icaap.org/methods/comparing_freestaprograms_missing.html
>> this also lists the data set I'm using, and attached below are the
>> results I get from PSPP (If you format this as courier, it aligns up
>> right.)
>>
>> So 2 questions
>> 1. How does pspp deal with missing? By the way, I tried coding blanks
>> as missing and also tried replacing all the missing values with -99999
>> and told pspp those were missing values, and got exactly the same
>> results.
>>
>> 2. There don't appear to be any options on how regression is done,
>> like forward, backward, forced, etc. I didn't see anything in the
>> documentation about it either. Is it just doing straight forced
>> regression? Will there be any options on how to do regression?
>>
>> Thanks very much.
>>
>> REGRESSION
>> /VARIABLES= c_arable climate North phone_kpop
>> /DEPENDENT= gini
>> /STATISTICS=COEFF R ANOVA.
>>
>> Model Summary
>> #====#========#=================#==========================#
>> # R #R Square|Adjusted R Square|Std. Error of the Estimate#
>> ##===#========#=================#==========================#
>> #|.60# .36| .35| 8.65#
>> ##===#========#=================#==========================#
>>
>> ANOVA
>> #===========#==============#===#===========#=====#============#
>> # #Sum of Squares| df|Mean Square| F |Significance#
>> ##==========#==============#===#===========#=====#============#
>> #|Regression# 4548.35| 4| 1137.09|15.19| .00#
>> #|Residual # 7933.89|106| 74.85| | #
>> #|Total # 12482.24|110| | | #
>> ##==========#==============#===#===========#=====#============#
>>
>> Coefficients
>> #===========#=====#==========#====#=====#============#
>> # # B |Std. Error|Beta| t |Significance#
>> ##==========#=====#==========#====#=====#============#
>> #|(Constant)#47.95| 2.06| .00|23.22| .00#
>> #| c_arable # -.12| .05|-.20|-2.28| .02#
>> #| climate #-1.24| 1.04|-.11|-1.20| .23#
>> #| North # -.14| .03|-.43|-4.96| .00#
>> #|phone_kpop# .00| .00|-.07| -.81| .42#
>> ##==========#=====#==========#====#=====#============#
>>
>>
>>
>>
>> Gene
>>
>>
>>
>> Gene Shackman, Ph.D.
>> The Global Social Change Research Project
>> http://gsociology.icaap.org
>> Free Resources for Methods in Evaluation and Social Research
>> http://gsociology.icaap.org/methods
>> ----------
>> Applied Sociologist
>> ----------
>>
>>
>>
>> _______________________________________________
>> Pspp-users mailing list
>> address@hidden
>> https://lists.gnu.org/mailman/listinfo/pspp-users
>
> --
> Renan Levine
> Department of Political Science
> University of Toronto - Scarborough
> address@hidden
> http://individual.utoronto.ca/renan
> (416) 208-2651
>
_______________________________________________
Pspp-users mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/pspp-users
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature