Compare values of a matrix
6 views (last 30 days)
Show older comments
Abhishek Singh
on 6 Jul 2019
Commented: Abhishek Singh
on 18 Jul 2019
I have a n*n matrix and want to group the numbers together in different classes. I will explain it with a 5*5 matrix. The upper and lower diagonal values are same while the diagonal matrix is always 1. This is a correlation matrix of pairs where (1,2)=(2,1). I have a threshold of 0.9 so whichever value is greater than 0.9 then both the values are clustered in the same group. I will start with row 1. I want to select the values greater than 0.9 which in this case is only 4th. Now I will group (1,4) in group A. I will also in this same iteration find the least value which should be less than 0.9. Here it is 3rd entry which is 0.75. So for the column 3 I move to row 3 in the second iteration. Now I will repeat the same process but I have to only group remaining columns which are not yet grouped which is 2,3,5 by the same rule. In this iteration only 5th is greater than 0.9 so I have (3,5) as the pair and the least value is (3,1) but since I have to only check 2,3,5 (1,4 are already grouped in A) I will look for minimum out of these two (2,5) which is 2. Now I move to row 2. In my algorithm now 2 is the only last remaining factor I'd look for the maximum value in the second row which is (2,5) and will group this remaining 2 with (2,3,5). I will repeat this process till all the items are grouped in groups A, B, C.... and so on. So to group the last remaining row number I just group them with the best value they have row number.
There would not be any case where a factor is common i.e they are in two groups. For example after (1,4) so when we do iteration for 3rd row then 4th column would not have greater value than 0.9 (most likely) because the factors are from certain image features which associates to each other in a way that since 1 is not associated with 3 very well and good with 4 hence 3 would also be not very well associateed with 4.
Please let me know if you have any questions as I think this may confuse you pretty much. Please let me know if you can let me know a simple algorithm which works for this. Thanks in advance.
My final output group will be (1,4),(2,3,5)
0001 0.88 0.75 0.91 0.79
0.88 0001 0.76 0.74 0.97
0.75 0.76 0001 0.76 0.99
0.91 0.74 0.76 0001 0.80
0.79 0.97 0.99 0.80 0001
3 Comments
Guillaume
on 6 Jul 2019
I didn't understand the meaning of So for the column 3 I move to row 3 in the second iteration. I think you need to explain what happens on the 2nd step as well.
It would also help if you showed what is the final result for the above matrix.
Accepted Answer
Guillaume
on 8 Jul 2019
I'm still not entirely clear on your whole algorithm, I think the following do what you want:
function clusters = cluster(m, threshold)
columns = 1:size(m, 2); %keep track of column indices remaining. clustered columns are removed from m when they are clustered
clusters = {};
currentrow = 1; %start on 1st row
while ~isempty(m)
tocluster = m(currentrow, :) > threshold; %cluster columns whose value is above threshold on current row. Note that since m(currentrow, currentrow) is 1, it's always included in the cluster
if sum(~tocluster) == 1 %if there's only one column left to cluster afterward
tocluster = true(size(tocluster)); %then include it in the current cluster
end
clusters{end + 1} = columns(tocluster); %#ok<AGROW> %get actual index of columns to cluster
[~, newrow] = min(m(currentrow, :)); %find next row to start clustering from
currentrow = columns(newrow);
m = m(:, ~tocluster); %get rid of columns that have been clustered.
columns = columns(~tocluster);
end
end
I certainly didn't understand what to do when there's only one column/row left to cluster. In the above it's included in the last cluster.
14 Comments
Guillaume
on 18 Jul 2019
The %#ok<AGROW> suppresses a warning in the editor that the array cluster grows inside the loop. It is typically a valid warning, you can usually preallocate the array. In this case however, you don't in advance how many clusters there will be, so you do have to grow the array in the loop. It is a cell array however and the number of clusters is not likely to be big so the impact on performance will be negligible.
If there is going to be a significant amount of clusters such that the array growing is having an impact then the alternative is to preallocate an array large enough (max size it can be is the number of rows) and trim it at the end:
function clusters = cluster(m, threshold)
columns = 1:size(m, 2); %keep track of column indices remaining. clustered columns are removed from m when they are clustered
clusters = cell(1, size(m, 1)); %preallocate cell array. Probably too big.
currentrow = 1; %start on 1st row
clusteridx = 0;
while ~isempty(m)
clusteridx = clusteridx + clusteridx + 1;
tocluster = m(currentrow, :) > threshold; %cluster columns whose value is above threshold on current row. Note that since m(currentrow, currentrow) is 1, it's always included in the cluster
clusters{clusteridx} = columns(tocluster); %get actual index of columns to cluster
[~, newrow] = min(m(currentrow, :)); %find next row to start clustering from
currentrow = columns(newrow);
m = m(:, ~tocluster); %get rid of columns that have been clustered.
columns = columns(~tocluster);
end
cluster = cluset(1:clusteridx); %trim cell array to portion used
end
This will temporarily use more memory for a possible marginal speed gain.
More Answers (1)
Bruno Luong
on 6 Jul 2019
It seems like you you want to cluster indexes according to correlation. In this case why not threshold you matrix, that gives some sort of connectivity graph, then using some graph technique to get the connex components.
There a a bunch of such function in File Exchange.
0 Comments
See Also
Categories
Find more on Shifting and Sorting Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!