How to reduce the number of unique values in a matrix?

1 view (last 30 days)
I would like to reduce the number of unique values in my matrix to a fixed number. If I just round my values, I still get a too high number of unique values. For instance, I would like to be able to group the matrix values into maybe 10 groups (=10 unique values). I would like the values of each group to relate to the original values, for instance as the mean of all the values in the group. My original idea was to do something like k-means clustering, but I don't think this can be done with data in a matrix.
Is there a way to do this?

Accepted Answer

Stephen23
Stephen23 on 27 Apr 2017
Edited: Stephen23 on 27 Apr 2017
Although your data is arranged in a matrix, the matrix is a red-herring because actually you want a simple 1D clustering of the values themselves, irrelevant of their position in the matrix. This is simple, as K-Means clustering can be done on any number of dimensions, including on 1D data. So convert your matrix to a vector, apply kmeans, and the use the indices to allocate the values into the clusters. The simply reshape to get back the matrix shape.
Here is a complete working example, with just two clusters for clarity:
>> inp = [1,9,8,8;9,8,8,1;1,8,1,9;7,8,2,1]
inp =
1 9 8 8
9 8 8 1
1 8 1 9
7 8 2 1
>> [idx,vec] = kmeans(inp(:),2);
>> out = reshape(vec(idx),size(inp))
out =
1.1667 8.2000 8.2000 8.2000
8.2000 8.2000 8.2000 1.1667
1.1667 8.2000 1.1667 8.2000
8.2000 8.2000 1.1667 1.1667

More Answers (1)

Adam
Adam on 27 Apr 2017
Edited: Adam on 27 Apr 2017
vals = ceil( 10 * vals / max( vals(:) ) );
  3 Comments
Adam
Adam on 27 Apr 2017
Well, once you have your 10 unique labels you can use them as indices into the original values and replace the labels with the average of those values e.g.
newVals = ceil( 10 * vals / max( vals(:) ) );
for n = 1:10
newVals( newVals == n ) = mean( vals( newVals == n ) );
end
Stephen23
Stephen23 on 27 Apr 2017
Edited: Stephen23 on 27 Apr 2017
I also considered rounding as per Adam's answer, but this has the disadvantage that then the cluster values are linearly spaced, and this might not best represent the actual cluster values. Consider clusters centered around 0, 3, and 10: rounding would split the 3 cluster into 0 and 5... this might not be the desired effect.

Sign in to comment.

Categories

Find more on Multidimensional Arrays in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!