clusterization of data in 1-D vector

5 views (last 30 days)
I have large logical vector looking as V = [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 ..............]
I need to find the position of each group of 1 (lets say - center of each group) but if two groups of ones are too close to each other (say, less than 3 zerros in between) I need to consider those groups as a single group. I.e. at the firs stage I need to find groups (bold-underlined elements) and then find the ceter element of each group (shift +/-1 element does not matter)
1st stage (clusterization): [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 ..............]
2nd stage (find a center of each cluster): [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ..............]
The way I implemented now is following: I do smoothing of the entire vector (it is couple million elements). The span is chousen to be equal of maximum expected lenght of the group and then I look for local maxima (islocalmax) with 'MinSeparation' of minimum distace between groups. It works, but really slow (I have 360x180 = 64800 of vectors - yes, it is LAT/LONG grid with ~10M elements in each vector)
Is any way to speed up this? I believe it should be some "textbook" examples of it!

Accepted Answer

Adam Danz
Adam Danz on 28 Oct 2020
Edited: Adam Danz on 28 Oct 2020
There are lots of alternatives.
  • Input A is a vector of 1s and 0s.
  • n is minimum number of 0s between 1s separate groups of 1s.
  • T is a table showing the start and stop index for each consecutive group of 1s split by less than n zeros and the length of each group.
A = [0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1];
% Length of each group of consecutive 1s
T = table();
T.OnesLength = diff(find([0;A(:);0]==0))-1;
T(T.OnesLength==0,:) = [];
% Index of 1st '1' in each group of consecutive 1s
T.OnesStart = find(diff([0;A(:)])==1);
% Index of last '1' in each group of consecutive 1s
T.OnesStop = T.OnesStart + T.OnesLength - 1;
% Determine the number of 0s between consecutive 1s
ZerosBetween = [T.OnesStart(2:end) - T.OnesStop(1:end-1); NaN]-1;
disp(T)
OnesLength OnesStart OnesStop __________ _________ ________ 3 4 6 3 9 11 6 18 23 2 29 30 1 32 32 2 34 35 1 37 37 4 42 45
% join groups of consecutive 1s with less than n zeros between.
n = 3;
joinGroups = ZerosBetween < n;
t = find(diff([0;joinGroups])==1);
f = find(diff([0;joinGroups])==-1);
T.remove = false(height(T),1);
for i = 1:numel(t)
T.OnesStop(t(i)) = T.OnesStop(f(i));
T.OnesLength(t(i)) = sum(T.OnesLength(t(i):f(i))) + sum(ZerosBetween(t(i):f(i)-1));
T.remove(t(i)+1:f(i)) = true;
end
T(T.remove,:) = [];
T.remove = [];
disp(T)
OnesLength OnesStart OnesStop __________ _________ ________ 8 4 11 6 18 23 9 29 37 4 42 45
Now you can use the segment length and the start/stop indices to compute the segement centers.
  1 Comment
paganelle
paganelle on 28 Oct 2020
Perfect way, thank you!
It is ~5 times faster than method I used previously.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!