[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-patch-tracker] [patch #10104] [octave forge] (statistics) Add ne
From: |
anonymous |
Subject: |
[Octave-patch-tracker] [patch #10104] [octave forge] (statistics) Add new function pca |
Date: |
Mon, 30 Aug 2021 12:46:03 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 |
URL:
<https://savannah.gnu.org/patch/?10104>
Summary: [octave forge] (statistics) Add new function pca
Project: GNU Octave
Submitted by: None
Submitted on: Mon 30 Aug 2021 04:46:02 PM UTC
Category: Forge : new function
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Email: s.guidoni@virgilio.it
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
This is a patch to add function 'pca' to the statistics package.
This is just the new name of the old function 'princomp', however I tried to
improve its compatibility, adding new features.
This patch requires patch #10102 (ismissing) and patch #10103 (weighted
standard deviation).
This patch misses the 'als' algorithm and its options. Also there is an
inconsistency with its counterpart when computing the Hotelling statistics
with weights: I am not sure if it is just numerical conditioning or if there
is some unbiasing (or something like that) of the data in MATLAB.
Anyway this needs testing, even though I used it for some time and I can say
that many things work.
Example usage:
octave:3> x
x =
7 4 3
4 1 8
6 3 5
8 6 1
8 5 7
7 2 9
5 3 3
9 5 8
7 4 5
8 2 2
octave:4> [c, s, l, t, e, m] = pca (x, "VariableWeights", "variance")
c =
0.9783 0.5862 -1.0107
1.0852 0.1536 1.1396
-0.9590 2.5764 0.5660
s =
0.514813 -0.630836 0.033512
-2.660010 0.062809 0.330893
-0.584039 -0.290606 0.156589
2.047758 -0.909635 0.366273
0.883274 0.991202 0.341538
-1.083764 1.208571 -0.447062
-0.761870 -1.197124 0.448103
1.182837 1.570676 -0.021826
0.271349 0.023254 0.177213
0.189653 -0.828313 -1.385233
l =
1.7688
0.9271
0.3041
t =
0.5828
4.3646
0.3646
3.7044
1.8844
2.8967
2.5342
3.4536
0.1455
7.0694
e =
58.959
30.903
10.138
m =
6.9000 3.5000 5.1000
New help with a description of the new features:
octave:5> help pca
-- Function File: [COEFF] = pca(X)
-- Function File: [COEFF] = pca(X, Name, Value)
-- Function File: [COEFF,SCORE,LATENT] = pca(...)
-- Function File: [COEFF,SCORE,LATENT,TSQUARED] = pca(...)
-- Function File: [COEFF,SCORE,LATENT,TSQUARED,EXPLAINED,MU] = pca(...)
Performs a principal component analysis on a data matrix X
A principal component analysis of a data matrix of 'n' observations
in a 'p'-dimensional space returns a 'p'-by-'p' transformation
matrix, to perform a change of basis on the data. The first
component of the new basis is the direction that maximizes the
variance of the projected data.
Input argument:
* X : a 'n'-by-'p' data matrix
Pair arguments:
* 'Algorithm' : the algorithm to use, it can be either 'eig',
for eigenvalue decomposition, or 'svd' (default), for singular
value decomposition
* 'Centered' : boolean indicator for centering the observation
data, it is 'true' by default
* 'Economy' : boolean indicator for the economy size output, it
is 'true' by default; 'pca' returns only the elements of
LATENT that are not necessarily zero, and the corresponding
columns of COEFF and SCORE, that is, when 'n <= p', only the
first 'n - 1'
* 'NumComponents' : the number of components 'k' to return, if
'k < p', then only the first 'k' columns of COEFF and SCORE
are returned
* 'Rows' : action to take with missing values, it can be either
'complete' (default), missing values are removed before
computation, 'pairwise' (only with algorithm 'eig'), the
covariance of rows with missing data is computed using the
available data, but the covariance matrix could be not
positive definite, which triggers the termination of 'pca',
'complete', missing values are not allowed, 'pca' terminates
with an error if there are any
* 'Weights' : observation weights, it is a vector of positive
values of length 'n'
* 'VariableWeights' : variable weights, it can be either a
vector of positive values of length 'p' or the string
'variance' to use the sample variance as weights
Return values:
* COEFF : the principal component coefficients, a 'p'-by-'p'
transformation matrix
* SCORE : the principal component scores, the representation of
X in the principal component space
* LATENT : the principal component variances, i.e., the
eigenvalues of the covariance matrix of X
* TSQUARED : Hotelling's T-squared Statistic for each
observation in X
* EXPLAINED : the percentage of the variance explained by each
principal component
* MU : the estimated mean of each variable of X, it is zero if
the data are not centered
References
----------
1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition,
Springer, 2002
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Mon 30 Aug 2021 04:46:02 PM UTC Name: pca.diff Size: 19KiB By: None
hg export -r tip
<http://savannah.gnu.org/patch/download.php?file_id=51840>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/patch/?10104>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-patch-tracker] [patch #10104] [octave forge] (statistics) Add new function pca,
anonymous <=