Find equally (almost equal) distanced elements in an array
9 views (last 30 days)
Show older comments
Assume we have an array Y1 as follows:
Y1 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.278441922763728,0,0,0,0,0,0,0,0,0,0,0,0.339039014549722,0,0,0,0,0,-0.983321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
If you plot it, you can see that elements located at 127, 152, 178, 203, 229 are almost equally located in the array, with an almost equal distance of 25 between these elements. Given signals similar to Y1, how can we find such elements? Please note that the Y1 here is the simplest possible case, where only one set of equally distributed elements are given. In reality, there might be two or three sets of elements with different spacing and different locations, For example:
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
If you plot Y2, you can see that elements located at 20, 25, 30, 35, 40 are equally distributed with the distance of 5 in between these elements (Set 1), and elements located at 127, 152, 178, 203, 229 are almost equally located with an almost equal distance of 25 between these elements (Set 2).
To sum up, given Y1, it is desired to obtain [127, 152, 178, 203, 229], and given Y2, it is expected to obtain {[20, 25, 30, 35, 40] and [127, 152, 178, 203, 229]}.
Y1 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.278441922763728,0,0,0,0,0,0,0,0,0,0,0,0.339039014549722,0,0,0,0,0,-0.983321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
figure; plot(Y1); axis tight; grid;
ind1 = find(Y1)
20 32 38 45 48 51 76 127 152 178 203 229
diff(ind1)
12 6 7 3 3 25 51 25 26 25 26
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
figure; plot(Y2); axis tight; grid;
ind2 = find(Y2)
20 25 30 35 40 47 50 54 79 130 155 181 206 232
diff(ind2)
5 5 5 5 7 3 4 25 51 25 26 25 26
0 Comments
Answers (2)
John D'Errico
on 16 Aug 2022
Edited: John D'Errico
on 16 Aug 2022
You don't really say enough about your question to make it easy to solve. How long will these vectors be? How many non-zeros? Do you know, in advance the strides that you want to search for? Or are you just looking for ANY (approximately) common strides? The latter is of course fairly difficult to do with any degree of intelligence. But you might be able to make something work using clustering in 1-dimension. For example...
Y1 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.278441922763728,0,0,0,0,0,0,0,0,0,0,0,0.339039014549722,0,0,0,0,0,-0.983321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
What matters are not what the elements are but WHERE they lie. So start by extracting them in a virtually sparse form.
ind1 = find(Y1)
As you can see, this is a pretty small set. But now, what matters is how far apart are they? So compute the interpoint distances. A simple way to do this is to use pdist.
D1 = squareform(pdist(ind1'))
That allows us to visualize the distance between any pair of non-zero elements. It also helps us to think about how we might look for clusters of points that lie at a common distance. Such clusters will lie near the main diagonal of the matrix D1, but not always perfectly so. Hmm. Anyway, suppose we decided to look for a set of equally distanced points, at roughly a distance of 25 as a stride? First, can we easily do that?
% find points that are at a stride of 25 +/- 3, converting the distance
% matrix into a graph.
G1 = graph(abs(D1 - 25) <= 3)
Now, can we find a set of nodes that are connected?
conncomp(G1)
This gives us a set of nodes that were connected in the graph. Do you see that the last 4 nodes were all connected, thus at a stride of 25+/- 3?
G1.Edges
If you can see how to read that table, you would see that nodes 8, 9, 10, 11, and 12 all lie in that common stride.
Similarly, look at Y2.
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
ind2 = find(Y2);
D2 = squareform(pdist(ind2'))
Again, we might use some tools to decide to look for the common cluster at a stride of 5.
G2 = graph(abs(D2 - 5) <= 1)
conncomp(G2)
As you can see there, we have a cluster of the first 5 nodes, all with a stride of roughly 5.
G2 = graph(abs(D2 - 25) <= 1)
conncomp(G2)
And again, a connected cluster of the ast 5 nodes, each roughly at a distance of 25.
With some effort, you might even be able to look for the common stride intelligently. Perhaps a way might exist to use clustering to search for those (NEARLY) common strides.
At the same time, be careful, since if your vector is terribly long, then things will go to hell, as that interpoint distance matrix will get larger. So don't expect this to work nearly as well if you have many thousands of nodes. Since there are now several questions left up in the air and unresolved, I'll stop here. Regardless, the graph theoretic tools In MATLAB will prove to be a terribly useful way to approach the problem.
2 Comments
John D'Errico
on 16 Aug 2022
Since the vectors will not be too long, the matrices will not grow immensely large. The sparser they are, the better of course. Would the above ideas be workable on a microcontroller? God only knows. Certainly not me. :) It would be worth testing on a larger problem.
But really, it looks like your main problem is one of automatically searching for a common stride. Once you have identified the stride, or at least a good candidate, then you have a graph-theoretic solution available.
One simple option would be to use brute force. That is, perform the above analysis, looking for groups of elements with a common stride of 2 +/- 1. Next, search for groups with a common stride of 3 +/- 1, continue the search, increasing the stride, until you find the set of maximal length. Could you do that? Of course. Whether it would be managable, I cannot say. It is very likely that with some thought a better solution exists.
John D'Errico
on 16 Aug 2022
Edited: John D'Errico
on 16 Aug 2022
As a subtly different answer, now that I understand your question a little better, consider this idea. My goal in this answer will be to see if an approximately common stride can be identified. The problem is, it need not be EXACT, so you would want to have a tolerance.
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
ind2 = find(Y2);
Now, assume we don't know what stride to search for. But it must happen between nodes that are close to each other.
conv(ind2,[1 -1],'valid')
conv(ind2,[1 0 -1],'valid')
conv(ind2,[1 0 0 -1],'valid')
conv(ind2,[1 0 0 0 -1],'valid')
conv(ind2,[1 0 0 0 0-1],'valid')
Now, I might scan through those differences, looking for some common, or nearly common strides. If I did do that, I would notice that a stride of something like 5-7 might be of interest. Then a common stride of 10 seems to pop out, as do 15 and 25.
Next, do the connected component analysis for each candidate, as I showed in my other answer.
G2 = graph(squareform(abs(pdist(ind2')' - 6) <= 1));
connections = conncomp(G2)
ind2(connections == 1) % 1 the number that appears the most frequently in connections
So we have found one sub-sequence of length 7.
I could now repeat the above analysis, looking for subsequences of length 10.
G2 = graph(squareform(abs(pdist(ind2')' - 10) <= 1));
connections = conncomp(G2)
ind2(connections == 1) % again, 1 indicates the longest subsequence
Next, consider subsequences of length approximately 25...
G2 = graph(squareform(abs(pdist(ind2')' - 25) <= 1));
connections = conncomp(G2)
ind2(connections == 8)
All of this was just a little playing around. You could automate something that works, but first, you need to find something that works.
See Also
Categories
Find more on Linear Algebra in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!