[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-patch-tracker] [patch #8559] rewrite of cmdscale for improved ma
From: |
JD Walsh |
Subject: |
[Octave-patch-tracker] [patch #8559] rewrite of cmdscale for improved matlab compatibility, mathematical accuracy |
Date: |
Sun, 26 Oct 2014 03:27:01 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0 Iceweasel/33.0 |
URL:
<http://savannah.gnu.org/patch/?8559>
Summary: rewrite of cmdscale for improved matlab
compatibility, mathematical accuracy
Project: GNU Octave
Submitted by: jdwalsh
Submitted on: Sun 26 Oct 2014 03:27:00 AM GMT
Category: None
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Email:
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
I'm a PhD candidate in Mathematics at Georgia Tech, and right now that
involves a LOT of Octave programming. I came across a problem with the
cmdscale function in the Octave-Forge Statistics package, and I'd like to
offer this replacement. As time permits, I hope to help write / improve other
parts of Octave.
--JD
TL;DR `cmdscale' needed a complete rewrite. The attached patch is a
replacement I wrote.
-----
The current `cmdscale' function does NOT perform classical multidimensional
scaling (MDS). Its implementation is nothing like the standard equations for
MDS, and the resulting matrix Y is completely different that that returned by
MatLab's `cmdscale'. Comparing loss functions such as stress and strain give
completely different results when using Octave and MatLab, so the current
function does not create equivalent solutions. As a result, code written using
MatLab's `cmdscale' gives dramatically different results when run in Octave.
MatLab's function also supports non-Euclidean matrices (and difference
matrices and similarity matrices). The current `cmdscale' does not.
These issues may be happening because the current `cmdscale' function squares
the distance matrix and re-centers it on the first row/column. Centering the
distance matrix on an arbitrarily-chosen row/column is not the same as the
double centering operation which is key to classical MDS. But even with
Euclidean matrices, the current `cmdscale' has a tendency to fail its own test
assertion when the given matrix sizes are changed. For instance:
for i = 1:400 % eventually fails, though 50 is generally sufficient
m=4; n=2; X=rand(m,n); D=pdist(X); assert(pdist(cmdscale(D)), D, m*n*eps);
end
In fairness, I suspect the test failure is due to expecting O(eps) accuracy
for matrix operations, since O(sqrt(eps)) accuracy is often the best one can
achieve when calling `eig'. Thus, any testing errors may not be relevant to
the discrepancies above.
I believe one cause of all these issues is made clear in the current
documentation, where it states that classical MDS is ``also known as principal
coordinate analysis.'' This is not correct, though the confusion is
understandable: classical MDS was called principal coordinate analysis (PCO)
at one time (Gower 1966). However, there is a subtle yet important difference
between the two techniques: when performing the centering operation in PCO the
first principal component (representing a baseline/mean) is usually removed
(Heiser & Meulman 1983). This does not take place in the double centering
operation used for classical MDS. Among other things, this means the
coordinates resulting from classical MDS are invariant up to rotation when
permuting the distance matrix, while those of PCO are not. (The coordinates of
classical MDS are also nested, and they tend to cluster around their
centroid.) These differences certainly explain the discrepancy with the MatLab
output, and could possibly be connected to any
issues related to the test assertion.
The replacement I have provided follows the standard MDS implementation (Borg
& Groenen 2005), and its results match those of MatLab up to column reflection
(i.e., a specific column in the output matrix may equal -1 * the same column
in MatLab's implementation). Since MDS is only intended to approximate
distance, not direction, my results are indistinguishable from MatLab's up to
at least O(sqrt(eps)) (when used appropriately for MDS), and so should be
compatible with programs dependent on MatLab's `cmdscale' function.
Classical MDS removes all eigenvalues which are not positive real values. The
current `cmdscale' checks for and removes negative eigenvalues, but not
complex eigenvalues. Truly symmetric matrices never have complex eigenvalues,
but their machine approximations in Octave occasionally generate complex
near-zero eigenvalues. I added code to test for and remove these.
I also added error checking (the full form of the input matrix must be
symmetric and have all nonnegative entries, with zeros on the main diagonal)
and I added the option, present in MatLab's code, to calculate MDS from a
similarity matrix (in which case the full form of the input matrix must be
symmetric and have all nonnegative entries less than or equal to one, with
ones on the main diagonal). Sparse and non-Euclidean matrices work just fine
fine.
I greatly expanded and (hopefully) clarified the documentation, and I also
changed the reference cited in the documention to something more current and
-- for my implementation -- more relevant. The book I cited (Borg & Groenen
2005) describes the formulation of classical MDS and was instrumental in my
revision of `cmdscale'.
-----
References:
Borg, I. & Groenen, P.J.F. (2005) Modern Multidimensional Scaling, 2nd
edition, Springer.
Gower, J.C. (1966) Some distance properties of latent root and vector methods
used in multivariate analysis. Biometrika, 53, 325--338.
Heiser, W.J. & Meulman, J. (1983) Analyzing rectangular tables by joint and
constrained multidimensional scaling, J. Econometrics, 22, 139–167.
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Sun 26 Oct 2014 03:27:00 AM GMT Name: cmdscale.patch Size: 8kB By:
jdwalsh
<http://savannah.gnu.org/patch/download.php?file_id=32320>
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/patch/?8559>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Octave-patch-tracker] [patch #8559] rewrite of cmdscale for improved matlab compatibility, mathematical accuracy,
JD Walsh <=