|
From: | Renan Levine |
Subject: | Re: regression, and missing data |
Date: | Tue, 06 Mar 2012 00:06:44 -0500 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 |
Hi- This appears to be a bug in the PSPP regression routine with data with a large amount of missing values! I recently noticed some small discrepancies between simple bivariate regression results between IBM SPSS, STATA and PSPP. Until Prof. Shackman's email, I hadn't realized that the discrepancies only occur when there are many missing values. I was just confused... Sadly, I also find problems when running linear regressions using PSPP on data with missing values. I wish I knew what was causing the problem. So, using Dropbox, I wanted to make available some data which seems to illustrate the issue. Using psppire.exe 0.7.9-gab8ce2 on Windows AND psppire 0.7.8 on LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS and STATA on the same dataset, but does not calculate identical b coefficients when running bivariate or multivariate regressions. I created the following public opinion survey data files consisting of three variables from the 2004 Canadian Election Study which I recoded and declared certain values to be missing: http://dl.dropbox.com/u/35198072/ces2004-regtest.sav has many observations with missing values. http://dl.dropbox.com/u/35198072/ces2004-regtest2.sav has the same three variables, but I dropped all of the cases with missing values. This is the syntax file used to run descriptive statistics and three regression analyses. http://dl.dropbox.com/u/35198072/regression-tests.sps PSPP generates these regression results and descriptive statistics with missing values: http://dl.dropbox.com/u/35198072/regression-test-pspp1.html PSPP generates these regression results and descriptive statistics using the data without any missing values: http://dl.dropbox.com/u/35198072/regression-test-pspp2.html Here is the STATA output on the same output (.log is a text file - email me if you have a problem opening it). The first three regressions should match the output in regression-test-pspp1.html They are close, but not close enough... The bottom three regressions use the data with no missing values and these DO match PSPP's output (in regression-test-pspp2.html). http://dl.dropbox.com/u/35198072/regression-test-stata.log I also ran the data on SPSS and found results consistent with STATA. There did not seem to be any problems with Pearson's Chi-Square or Kendall's Tau-B when running a crosstab on the data with the missing values. I am sorry I don't know what has gone wrong, so I am making available this data in hopes someone might figure out where there is a mistake. I caution other users running regression on PSPP. Yours, Renan On 04-Mar-12 11:37 PM, Gene Shackman wrote:
-- Renan Levine Department of Political Science University of Toronto - Scarborough address@hidden http://individual.utoronto.ca/renan (416) 208-2651 |
[Prev in Thread] | Current Thread | [Next in Thread] |