Classical Multidimensional Scaling
This example shows how to use cmdscale
to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.
cmdscale
takes as an input a matrix of inter-point distances and creates a configuration of points. Ideally, those points are in two or three dimensions, and the Euclidean distances between them reproduce the original distance matrix. Thus, a scatter plot of the points created by cmdscale
provides a visual representation of the original distances.
As a very simple example, you can reconstruct a set of points from only their inter-point distances. First, create some four dimensional points with a small component in their fourth coordinate, and reduce them to distances.
rng default; % For reproducibility X = [normrnd(0,1,10,3),normrnd(0,.1,10,1)]; D = pdist(X,'euclidean');
Next, use cmdscale
to find a configuration with those inter-point distances. cmdscale
accepts distances as either a square matrix, or, as in this example, in the vector upper-triangular form produced by pdist
.
[Y,eigvals] = cmdscale(D);
cmdscale
produces two outputs. The first output, Y
, is a matrix containing the reconstructed points. The second output, eigvals
, is a vector containing the sorted eigenvalues of what is often referred to as the "scalar product matrix," which, in the simplest case, is equal to Y*Y'
. The relative magnitudes of those eigenvalues indicate the relative contribution of the corresponding columns of Y
in reproducing the original distance matrix D
with the reconstructed points.
format short g [eigvals eigvals/max(abs(eigvals))]
ans = 10×2
35.41 1
11.158 0.31511
1.6894 0.04771
0.1436 0.0040553
2.9678e-15 8.3812e-17
1.7158e-15 4.8454e-17
1.5224e-15 4.2995e-17
-1.4626e-15 -4.1303e-17
-1.7759e-15 -5.0153e-17
-8.4529e-15 -2.3871e-16
If eigvals
contains only positive and zero (within round-off error) eigenvalues, the columns of Y
corresponding to the positive eigenvalues provide an exact reconstruction of D
, in the sense that their inter-point Euclidean distances, computed using pdist
, for example, are identical (within round-off) to the values in D
.
maxerr4 = max(abs(D - pdist(Y))) % Exact reconstruction
maxerr4 = 5.107e-15
If two or three of the eigenvalues in eigvals
are much larger than the rest, then the distance matrix based on the corresponding columns of Y
nearly reproduces the original distance matrix D
. In this sense, those columns form a lower-dimensional representation that adequately describes the data. However it is not always possible to find a good low-dimensional reconstruction.
maxerr3 = max(abs(D - pdist(Y(:,1:3)))) % Good reconstruction in 3D
maxerr3 = 0.043142
maxerr2 = max(abs(D - pdist(Y(:,1:2)))) % Poor reconstruction in 2D
maxerr2 = 0.98315
The reconstruction in three dimensions reproduces D
very well, but the reconstruction in two dimensions has errors that are of the same order of magnitude as the largest values in D
.
max(max(D))
ans = 5.8974
Often, eigvals
contains some negative eigenvalues, indicating that the distances in D
cannot be reproduced exactly. That is, there might not be any configuration of points whose inter-point Euclidean distances are given by D
. If the largest negative eigenvalue is small in magnitude relative to the largest positive eigenvalues, then the configuration returned by cmdscale
might still reproduce D
well.