How to choose 60 unique coordinates ('A') from 600 coordinates('B'), nearest to 60 other coordinates('C')

3 views (last 30 days)
Hello,
I have a question. Imagine we have an array 'C' containing 60*2 elements and another vector 'B' containing 600*2 elements. Each of these 2 arrays represent 'x' and 'y' coordinates of our points. I wish to choose 60 nearest coordinates from 'B' in a way that these coordinates( array 'A') be unique and nearest to coordinates in 'C'.
I want to do this with highest speed because I have to implement it multiple times. Definitely using 'for loop' would increase complexity so I want to avoid it.
Any help would be appreciated.
Thank you

Answers (3)

John D'Errico
John D'Errico on 21 Oct 2024
Edited: John D'Errico on 21 Oct 2024
As I pointed out to @Voss and @Star Strider, if the points in B mst be unique, then there is an issue. If you simply choose the closest points (where it might be best to use knnsearch anyway) then one of the points in B may be simultaneously closest to different points in C.
The simple answer is to test to see if there are any replicate points chosen, where point k from B is simultaneously closest to two (or more!) points in C. But then you need to decide which other point to add to the set. I would probably do this by a resampling scheme, where all of the points already chosen are effectively removed from B. Now find a new point, which is the next closest to the point from C which was impacted. The code would not be too complex, and it would be reasonably efficient.
I'll give an example. Here, I'll use a smaller set of points to make thigns easier to visualize, and also to make it more likely there will be a conflict, I have made the size of B smaller. Larger sets will of course reduce the probability of a failure.
C = rand(10,2);
B = rand(15,2);
[idxB,D] = knnsearch(B,C,K=1)
idxB = 10×1
6 1 1 13 12 7 1 1 13 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
D = 10×1
0.1258 0.2244 0.1678 0.1289 0.0267 0.2911 0.1258 0.1877 0.2728 0.0600
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
In this set, you can see that point # 1 from B was simultaneously closest to 4 different points in C. And point 13 was also duplicated in that list. We could test to see if there were any replicates simply enough. For example,
if numel(unique(idxB)) < size(C,1)
% stuff
end
But now there is an ambiguity. Point #1 from B was closest to point 7 in C. Do we now choose a second best point from B for points [2 3 8] in C? What if there is again an issue? Is there some optimal set which is "best" in some unspecified way? And this is where your problem specification fails, because it does not suggest how these issues must be addressed.
Will this be a frequent problem? SURE AS HELL YES! Consider this simulation.
failcount = 0;
for i = 1:100
C = rand(60,2);
B = rand(600,2);
[idxB,D] = knnsearch(B,C,k=1);
if numel(unique(idxB)) ~= size(C,1)
failcount = failcount + 1;
end
end
failcount
failcount =
99
So out of 100 random sets generated, 99 of them had a conflict. If uniqueness is an issue, then you WILL need to write code that worries about these conflicts.
Anyway, COULD you resolve all conflicts using a loop? Well, yes.
C = rand(60,2);
B = rand(600,2);
idxB = NaN(size(C,1),1);
Btemp = B;
for i = 1:size(C,1)
idxB(i) = knnsearch(Btemp,C(i,:),k=1);
Btemp(idxB(i),:) = inf;
end
numel(unique(idxB))
ans =
60
And clearly this will result in a unique set, that has no conflicts.

Voss
Voss on 21 Oct 2024
Edited: Voss on 21 Oct 2024
C = rand(60,2);
B = rand(600,2);
[~,idx] = min(sum((permute(C,[1 3 2])-permute(B,[3 1 2])).^2,3),[],2);
A = B(idx,:);
figure
hold on
plot(B(:,1),B(:,2),'.b')
plot(C(:,1),C(:,2),'xr')
plot(A(:,1),A(:,2),'og')
A contains the nearest point in B to each point in C. This doesn't guarantee unique rows of A.

Star Strider
Star Strider on 21 Oct 2024
Using pdist2 is an option —
C = rand(60,2);
B = rand(600,2);
[A,I] = pdist2(B,C,'euclidean','smallest',1)
A = 1×60
0.0344 0.0214 0.0052 0.0389 0.0349 0.0287 0.0120 0.0206 0.0079 0.0323 0.0089 0.0251 0.0155 0.0172 0.0109 0.0129 0.0429 0.0451 0.0286 0.0266 0.0215 0.0253 0.0099 0.0054 0.0148 0.0050 0.0092 0.0271 0.0256 0.0420
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
I = 1×60
235 357 66 122 92 122 221 334 332 421 21 492 384 135 246 391 138 257 600 294 554 102 36 329 216 116 542 412 167 246
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
figure
scatter(C(:,1), C(:,2), '.b', 'DisplayName','C')
hold on
scatter(B(:,1), B(:,2), '.g', 'DisplayName','B')
scatter(B(I,1), B(I,2), 'or', 'DisplayName','Closest ‘B’ To ‘C’')
hold off
axis('equal')
legend('Location','northoutside', 'Orientation','horizontal')
.
  3 Comments
Star Strider
Star Strider on 21 Oct 2024
To ‘guarantee uniqueness’ implies to me looping through all the points in both matrices in order to determine if any points in ‘C’ are equidistant from more than one point in ‘B’. It would certainly be possible to do that, however it would require a loop and a significant amount of time.
Another option would be to return two distances for each ‘C’ point to determine if any distances are the same. If none of them are, that is likeley as close as you can get to ‘guarantee uniqueness’.
Asking the find function to return the indices of any columns where the distances are the same, in this instance returns an empty vector. So the top row of ‘I’ returns the indices of the unique rows of ‘C’ that are closest to elements of ‘B’, and the top row of ‘D’ returns their distances.
C = rand(60,2);
B = rand(600,2);
[D,I] = pdist2(B,C,'euclidean','smallest',2)
D = 2×60
0.0124 0.0167 0.0154 0.0294 0.0060 0.0190 0.0451 0.0099 0.0127 0.0140 0.0223 0.0459 0.0127 0.0660 0.0296 0.0069 0.0124 0.0197 0.0214 0.0162 0.0321 0.0236 0.0269 0.0350 0.0245 0.0314 0.0136 0.0346 0.0198 0.0091 0.0189 0.0474 0.0162 0.0391 0.0146 0.0250 0.0458 0.0200 0.0192 0.0424 0.0363 0.0471 0.0211 0.0674 0.0395 0.0154 0.0260 0.0423 0.0341 0.0265 0.0324 0.0253 0.0288 0.0477 0.0311 0.0448 0.0340 0.0355 0.0208 0.0172
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
I = 2×60
435 300 56 464 198 49 568 380 41 568 552 70 172 312 415 593 233 213 242 242 233 142 506 371 427 96 71 369 3 480 463 94 466 117 597 325 297 141 189 311 120 320 268 508 55 13 490 298 531 531 490 127 389 568 435 229 60 258 441 347
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
A = B(I(1,:),:);
% disp(A) % Optional
notunique = find(any(diff(D)==0,1))
notunique = 1x0 empty double row vector
Check = D(:,notunique)
Check = 2x0 empty double matrix
figure
scatter(C(:,1), C(:,2), '.b', 'DisplayName','C')
hold on
scatter(B(:,1), B(:,2), '.g', 'DisplayName','B')
scatter(A(:,1), A(:,2), 'or', 'DisplayName','‘A’ Closest ‘B’ To ‘C’')
hold off
axis('equal')
legend('Location','northoutside', 'Orientation','horizontal')
figure
stem3(A(:,1), A(:,2), D(1,:), Marker='.')
grid on
xlabel('A(:,1)')
ylabel('A(:,2)')
zlabel('D')
Quod erat demonstrandum!
.
John D'Errico
John D'Errico on 21 Oct 2024
Edited: John D'Errico on 21 Oct 2024
Seems simple enough.
C = rand(60,2);
B = rand(600,2);
idxB = NaN(size(C,1),1);
Btemp = B;
for i = 1:size(C,1)
idxB(i) = knnsearch(Btemp,C(i,:),k=1);
Btemp(idxB(i),:) = inf;
end
numel(unique(idxB))
ans = 60
Just use a tool like pdist2 (or knnsearch) to find the closest element of B to each element of C, one at a time. Then zap out the element of B that you just found, effectively removing it from the search.
But since uniqueness was stated to be important, AND since almost always, a call to pdist2 or knnsearch will almost always experience at least a few conflicts, you need to do something.

Sign in to comment.

Categories

Find more on Multidimensional Arrays in Help Center and File Exchange

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!