[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Next step in covariance matrix
From: |
Jason Stover |
Subject: |
Re: Next step in covariance matrix |
Date: |
Tue, 27 Oct 2009 15:20:20 -0400 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Tue, Oct 27, 2009 at 06:25:32PM +0000, John Darrington wrote:
> Will that be enough to allow a subset of GLM to be implemented?
Yes, except for the interactions.
>
> J'
>
> On Tue, Oct 27, 2009 at 11:47:23AM -0400, Jason Stover wrote:
> On Tue, Oct 27, 2009 at 06:38:19AM +0000, John Darrington wrote:
> > Just to make sure I understand things correctly, consider the
> following example,
> > where x and y are numeric variables and A and B are categorical ones:
> >
> > x y A B
> > =======
> > 3 4 x v
> > 5 6 y v
> > 7 8 z w
> >
> > We replace the categorical variables with bit_vectors:
> >
> > x y A_0 A_1 A_2 B_0 B_1
> > ========================
> > 3 4 1 0 0 1 0
> > 5 6 0 1 0 1 0
> > 7 8 0 0 1 0 1
> >
> > and arbitrarily drop the (say zeroth) subscript:
> >
> > x y A_1 A_2 B_1
> > ==================
> > 3 4 0 0 0
> > 5 6 1 0 0
> > 7 8 0 1 1
> >
> > That will produce a 5x5 matrix. 5 is calculated from n + m - p, where
> > n is the number of numeric variables, m is the total number of
> categories,
> > and p is the number of categorical variables.
>
> This is correct.
>
> > However I don't see how such a matrix can be very useful. A better one
> would involve
> > the products of the categorical and numeric variables:
> >
> > x y x*A_1 x*A_2 y*A_1 y*A_2 x*B_1 y*B_1
> > ===========================================
> > 3 4 0 0 0 0 0 0
> > 5 6 5 0 6 0 0 0
> > 7 8 0 7 0 8 7 8
> >
> > This makes an 8x8 matrix, where 8 is calculated from n + n * (m - p) ,
> > which happens to be identical to n * (1 + m - p). But this involves
> > a whole lot more calculations.
>
> This second choice would give you the covariance of x and y, and the
> covariances of the *interactions* between x and A, x and B, y and A,
> and y and B, but not the covariance between (say) x and A. The
> covariance between x and A would be stored in the first matrix you
> mentioned, in elements (0,2), (0,3), (2,0) and (3,0) assuming we kept
> both upper and lower triangles.
>
> You mention that matrix not being very useful, and in a sense it
> isn't: No human would care about the covariance between x and the
> column corresponding to the first bit vector of A. But in another
> sense, that matrix is absolutely necessary: It's used to solve the
> least squares problem, whose solution we use to tell us if A and our
> dependent variable are related. That relation is shown via analysis of
> variance, whose p-value is many computations away from the covariance
> matrix, but depends on it nevertheless.
>
> This matrix is unnecessary for a one-way ANOVA, whose computations from
> the matrix above can be simplified into the simple sums used in
> oneway.q. But for a bigger model, with many factors and interactions
> and covariates, we need that first matrix because we can't reduce the
> problem to a few easy-to-read summations.
>
>
> _______________________________________________
> pspp-dev mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/pspp-dev
>
> --
> PGP Public key ID: 1024D/2DE827B3
> fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
> See http://pgp.mit.edu or any PGP keyserver for public key.
>
>
Re: Next step in covariance matrix, John Darrington, 2009/10/31