octave-patch-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-patch-tracker] [patch #10104] [octave forge] (statistics) Add ne


From: anonymous
Subject: [Octave-patch-tracker] [patch #10104] [octave forge] (statistics) Add new function pca
Date: Mon, 30 Aug 2021 12:46:03 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36

URL:
  <https://savannah.gnu.org/patch/?10104>

                 Summary: [octave forge] (statistics) Add new function pca
                 Project: GNU Octave
            Submitted by: None
            Submitted on: Mon 30 Aug 2021 04:46:02 PM UTC
                Category: Forge : new function
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: s.guidoni@virgilio.it
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

This is a patch to add function 'pca' to the statistics package.

This is just the new name of the old function 'princomp', however I tried to
improve its compatibility, adding new features.

This patch requires patch #10102 (ismissing) and patch #10103 (weighted
standard deviation).

This patch misses the 'als' algorithm and its options. Also there is an
inconsistency with its counterpart when computing the Hotelling statistics
with weights: I am not sure if it is just numerical conditioning or if there
is some unbiasing (or something like that) of the data in MATLAB.
Anyway this needs testing, even though I used it for some time and I can say
that many things work.

Example usage:

octave:3> x
x =

   7   4   3
   4   1   8
   6   3   5
   8   6   1
   8   5   7
   7   2   9
   5   3   3
   9   5   8
   7   4   5
   8   2   2

octave:4> [c, s, l, t, e, m] = pca (x, "VariableWeights", "variance")         
 
c =

   0.9783   0.5862  -1.0107
   1.0852   0.1536   1.1396
  -0.9590   2.5764   0.5660

s =

   0.514813  -0.630836   0.033512
  -2.660010   0.062809   0.330893
  -0.584039  -0.290606   0.156589
   2.047758  -0.909635   0.366273
   0.883274   0.991202   0.341538
  -1.083764   1.208571  -0.447062
  -0.761870  -1.197124   0.448103
   1.182837   1.570676  -0.021826
   0.271349   0.023254   0.177213
   0.189653  -0.828313  -1.385233

l =

   1.7688
   0.9271
   0.3041

t =

   0.5828
   4.3646
   0.3646
   3.7044
   1.8844
   2.8967
   2.5342
   3.4536
   0.1455
   7.0694

e =

   58.959
   30.903
   10.138

m =

   6.9000   3.5000   5.1000


New help with a description of the new features:

octave:5> help pca

 -- Function File: [COEFF] = pca(X)
 -- Function File: [COEFF] = pca(X, Name, Value)
 -- Function File: [COEFF,SCORE,LATENT] = pca(...)
 -- Function File: [COEFF,SCORE,LATENT,TSQUARED] = pca(...)
 -- Function File: [COEFF,SCORE,LATENT,TSQUARED,EXPLAINED,MU] = pca(...)
     Performs a principal component analysis on a data matrix X

     A principal component analysis of a data matrix of 'n' observations
     in a 'p'-dimensional space returns a 'p'-by-'p' transformation
     matrix, to perform a change of basis on the data.  The first
     component of the new basis is the direction that maximizes the
     variance of the projected data.

     Input argument:
        * X : a 'n'-by-'p' data matrix

     Pair arguments:
        * 'Algorithm' : the algorithm to use, it can be either 'eig',
          for eigenvalue decomposition, or 'svd' (default), for singular
          value decomposition
        * 'Centered' : boolean indicator for centering the observation
          data, it is 'true' by default
        * 'Economy' : boolean indicator for the economy size output, it
          is 'true' by default; 'pca' returns only the elements of
          LATENT that are not necessarily zero, and the corresponding
          columns of COEFF and SCORE, that is, when 'n <= p', only the
          first 'n - 1'
        * 'NumComponents' : the number of components 'k' to return, if
          'k < p', then only the first 'k' columns of COEFF and SCORE
          are returned
        * 'Rows' : action to take with missing values, it can be either
          'complete' (default), missing values are removed before
          computation, 'pairwise' (only with algorithm 'eig'), the
          covariance of rows with missing data is computed using the
          available data, but the covariance matrix could be not
          positive definite, which triggers the termination of 'pca',
          'complete', missing values are not allowed, 'pca' terminates
          with an error if there are any
        * 'Weights' : observation weights, it is a vector of positive
          values of length 'n'
        * 'VariableWeights' : variable weights, it can be either a
          vector of positive values of length 'p' or the string
          'variance' to use the sample variance as weights

     Return values:
        * COEFF : the principal component coefficients, a 'p'-by-'p'
          transformation matrix
        * SCORE : the principal component scores, the representation of
          X in the principal component space
        * LATENT : the principal component variances, i.e., the
          eigenvalues of the covariance matrix of X
        * TSQUARED : Hotelling's T-squared Statistic for each
          observation in X
        * EXPLAINED : the percentage of the variance explained by each
          principal component
        * MU : the estimated mean of each variable of X, it is zero if
          the data are not centered

     References
     ----------

       1. Jolliffe, I. T., Principal Component Analysis, 2nd Edition,
          Springer, 2002




    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Mon 30 Aug 2021 04:46:02 PM UTC  Name: pca.diff  Size: 19KiB   By: None
hg export -r tip
<http://savannah.gnu.org/patch/download.php?file_id=51840>

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/patch/?10104>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]