how to calculate cosine similarity on a codistributed array?
Show older comments
I have to calculate the cosine similarity between the rows of an array. It works in a serial execution with pdist, but this is not working when working with codistributed arrays on MDCS. In the parallel setup, 4 compute nodes are used and the (large) array is distributed row-wise over the 4 nodes. I wrote a naive function to calculate the cosine similarity, but it takes for ages, even with a small array it takes (too) long.
This is the test I use currently: I generate a random array
r = floor(rand(100, codistributor('1d', 1)))
q = cosineSimilarityNaive(r)
the code of the function:
function [res] = cosineSimilarityNaive(data)
% get the dimensions
[n_row n_col] = size(data);
% calculate the norm for each row
%
norm_r = sqrt(sum(abs(data).^2,2));
%
for i = 1:n_row
%
for j = i:n_row
%
res(i,j) = dot(data(i,:), data(j,:)) / (norm_r(i) * norm_r(j));
res(j,i) = res(i,j);
end
end
Currently I have no idea on how to make it run faster, codistributed arrays on different nodes are necessary since the array is so large that is does not fit on 1 compute node. I did some testing on with svd on a distributed array over 4 nodes, and this works fine. I think I am missing something in my code, but currently I have no clue. Any tips?
Accepted Answer
More Answers (0)
Categories
Find more on Distributed Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!