Main Content

This example shows how to use `cmdscale`

to perform classical (metric) multidimensional scaling, also known as principal coordinates analysis.

`cmdscale`

takes as an input a matrix of inter-point distances and creates a configuration of points. Ideally, those points are in two or three dimensions, and the Euclidean distances between them reproduce the original distance matrix. Thus, a scatter plot of the points created by `cmdscale`

provides a visual representation of the original distances.

As a very simple example, you can reconstruct a set of points from only their inter-point distances. First, create some four dimensional points with a small component in their fourth coordinate, and reduce them to distances.

rng default; % For reproducibility X = [normrnd(0,1,10,3),normrnd(0,.1,10,1)]; D = pdist(X,'euclidean');

Next, use `cmdscale`

to find a configuration with those inter-point distances. `cmdscale`

accepts distances as either a square matrix, or, as in this example, in the vector upper-triangular form produced by `pdist`

.

[Y,eigvals] = cmdscale(D);

`cmdscale`

produces two outputs. The first output, `Y`

, is a matrix containing the reconstructed points. The second output, `eigvals`

, is a vector containing the sorted eigenvalues of what is often referred to as the "scalar product matrix," which, in the simplest case, is equal to `Y*Y'`

. The relative magnitudes of those eigenvalues indicate the relative contribution of the corresponding columns of `Y`

in reproducing the original distance matrix `D`

with the reconstructed points.

format short g [eigvals eigvals/max(abs(eigvals))]

`ans = `*10×2*
35.41 1
11.158 0.31511
1.6894 0.04771
0.1436 0.0040553
3.1014e-15 8.7586e-17
2.3149e-15 6.5375e-17
7.5173e-16 2.1229e-17
-5.4911e-17 -1.5507e-18
-1.9507e-15 -5.509e-17
-3.9794e-15 -1.1238e-16

If `eigvals`

contains only positive and zero (within round-off error) eigenvalues, the columns of `Y`

corresponding to the positive eigenvalues provide an exact reconstruction of `D`

, in the sense that their inter-point Euclidean distances, computed using `pdist`

, for example, are identical (within round-off) to the values in `D`

.

`maxerr4 = max(abs(D - pdist(Y))) % Exact reconstruction`

maxerr4 = 2.6645e-15

If two or three of the eigenvalues in `eigvals`

are much larger than the rest, then the distance matrix based on the corresponding columns of `Y`

nearly reproduces the original distance matrix `D`

. In this sense, those columns form a lower-dimensional representation that adequately describes the data. However it is not always possible to find a good low-dimensional reconstruction.

`maxerr3 = max(abs(D - pdist(Y(:,1:3)))) % Good reconstruction in 3D`

maxerr3 = 0.043142

`maxerr2 = max(abs(D - pdist(Y(:,1:2)))) % Poor reconstruction in 2D`

maxerr2 = 0.98315

The reconstruction in three dimensions reproduces `D`

very well, but the reconstruction in two dimensions has errors that are of the same order of magnitude as the largest values in `D`

.

max(max(D))

ans = 5.8974

Often, `eigvals`

contains some negative eigenvalues, indicating that the distances in `D`

cannot be reproduced exactly. That is, there might not be any configuration of points whose inter-point Euclidean distances are given by `D`

. If the largest negative eigenvalue is small in magnitude relative to the largest positive eigenvalues, then the configuration returned by `cmdscale`

might still reproduce `D`

well.