[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Next step in covariance matrix
From: |
John Darrington |
Subject: |
Re: Next step in covariance matrix |
Date: |
Tue, 27 Oct 2009 06:38:19 +0000 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
On Sun, Oct 25, 2009 at 11:54:16PM -0400, Jason Stover wrote:
I haven't tested it yet, but it looks like the computation of the
dimension might be wrong when categorical variables are involved.
If a categorical variable has k categories, its contribution to the
dimension should be k-1. But this line in covariance.c:
cov->dim = cov->n_vars + categoricals_total (cov->categoricals);
...suggests the contribution to the dimension would be k.
The contribution to the dimension is k-1 because the range of possible
values of k categories is spanned by k-1 basis vectors. The kth
vector is the origin, which corresponds to exactly one of the
categories. Which is chosen as the origin is arbitrary (some software
chooses the first category seen, some the last).
So I've probably done it wrong.
Just to make sure I understand things correctly, consider the following
example,
where x and y are numeric variables and A and B are categorical ones:
x y A B
=======
3 4 x v
5 6 y v
7 8 z w
We replace the categorical variables with bit_vectors:
x y A_0 A_1 A_2 B_0 B_1
========================
3 4 1 0 0 1 0
5 6 0 1 0 1 0
7 8 0 0 1 0 1
and arbitrarily drop the (say zeroth) subscript:
x y A_1 A_2 B_1
==================
3 4 0 0 0
5 6 1 0 0
7 8 0 1 1
That will produce a 5x5 matrix. 5 is calculated from n + m - p, where
n is the number of numeric variables, m is the total number of categories,
and p is the number of categorical variables.
However I don't see how such a matrix can be very useful. A better one would
involve
the products of the categorical and numeric variables:
x y x*A_1 x*A_2 y*A_1 y*A_2 x*B_1 y*B_1
===========================================
3 4 0 0 0 0 0 0
5 6 5 0 6 0 0 0
7 8 0 7 0 8 7 8
This makes an 8x8 matrix, where 8 is calculated from n + n * (m - p) ,
which happens to be identical to n * (1 + m - p). But this involves
a whole lot more calculations.
Which of these schemes are correct, if any?
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
signature.asc
Description: Digital signature
- Next step in covariance matrix, John Darrington, 2009/10/24
- Re: Next step in covariance matrix, Jason Stover, 2009/10/25
- Re: Next step in covariance matrix,
John Darrington <=
- Re: Next step in covariance matrix, Jason Stover, 2009/10/27
- Re: Next step in covariance matrix, John Darrington, 2009/10/27
- Re: Next step in covariance matrix, Jason Stover, 2009/10/27
- Re: Next step in covariance matrix, John Darrington, 2009/10/27
- Re: Next step in covariance matrix, Jason Stover, 2009/10/27
Re: Next step in covariance matrix, John Darrington, 2009/10/31