Comparing columns of two sparse matrices

Hello. I want to do the following... Input: two sparse matrices A,B. Output: sparse matrix C, with C(i,j)=1 exactly when the ith column of A is equal to the jth column of B. The quickest code I have so far is this one:
r=[];
c=[];
for i=1:size(A,2)
for j=1:size(B,2)
if isequal(A(:,i),B(:,j))
r=[r,i];
c=[c,j];
end
end
end
C=sparse(r,c,ones(1,length(r)));
But for big matrices it gets very slow. Is there a way to make it run faster?
cheers

 Accepted Answer

Stephen23
Stephen23 on 22 Jan 2016
Edited: Stephen23 on 22 Jan 2016
Solution One: Simple
Here is a simple solution using ndgrid and arrayfun (but slow):
>> A = [1,1,0,0; 1,0,1,0]
A =
1 1 0 0
1 0 1 0
>> B = [0,1,0,1; 0,0,1,1]
B =
0 1 0 1
0 0 1 1
>> [X,Y] = ndgrid(1:size(A,2),1:size(B,2));
>> arrayfun(@(a,b)isequal(A(:,a),B(:,b)), X, Y)
ans =
0 0 0 1
0 1 0 0
0 0 1 0
1 0 0 0
Solution Two: Fastest
Using two nested loops is usually going to be much faster, as long as the output array is preallocated before the loops. Not preallocating arrays is the number one cause of beginners writing slow code. When arrays are not preallocated, such as in your code, on every loop iteration MATLAB has to check the array size and move the array to a new memory location if it has grown larger. This effect gets even more significant as the arrays get larger, as you have discovered
To solve this I simply preallocated the array before the loop, and so it does not grow inside the loops and the code is much much faster.
out = nan(size(A,2),size(B,2));
for a = 1:size(A,2)
for b = 1:size(B,2)
out(a,b) = isequal(A(:,a),B(:,b));
end
end
Do NOT expand arrays inside loops! Every time you write code like this, by concatenating values onto an array:
Vec = []
for ...
Vec = [Vec,new_value]
end
you are writing slow code. Learn to preallocate arrays!

6 Comments

maybe my example was misleading. Input and output should be sparse. In this case
[X,Y] = ndgrid(1:size(A,2),1:size(B,2));
arrayfun(@(a,b)isequal(A(:,a),B(:,b)), X, Y)
is slower (around factor 5) than the simple code I posted at the beginning.
Stephen23
Stephen23 on 22 Jan 2016
Edited: Stephen23 on 22 Jan 2016
Yes, that I why I noted that that solution as being slow but simple. You will see I have marked the fast solution too.
to preallocate a sparse matrix makes the code, for big matrices, very slow too. E.g.
out = sparse(size(A,2),size(B,2));
for a = 1:size(A,2)
for b = 1:size(B,2)
out(a,b) = isequal(A(:,a),B(:,b));
end
end
would not be a good solution.
So my intention in the first code example was to collect all column indices in vectors r and c, and then use the sparse function to create the matrix. But I don't know the size of r and c in advance. That is why preallocation was not an option.
Stephen23
Stephen23 on 22 Jan 2016
Edited: Stephen23 on 22 Jan 2016
Why does the output array need to be sparse? It contains totally different information to A and B, and there is no guarantee that it is sparse. How many columns do A and B have? How sparse are they? What proportion of them are equal?
The marix has to be sparse because memory is a big issue. The output of dimension mxn will be around m=n=2^10-2^20. I cannot estimate the proportion of equal columns. But from my experience the output is "quite" sparse. I tried an example. The output is 2^12x2^12 and has about 2^10 non-zeros.
Stephen23
Stephen23 on 22 Jan 2016
Edited: Stephen23 on 22 Jan 2016
That is reasonably large.
One possible solution would be to use my second method (nested loops with preallocation) on subsets of the matrices. The nested loops will be fast enough on these subsets (a million elements or so), and the results (after converting to sparse or indices) can be concatenated inside the loop (but with far fewer concatenations that what you currently use). It would then be a matter of balancing the size of the loop out variable against the speed of the concatenation. Run some tests to check it.

Sign in to comment.

More Answers (1)

[Arows, Acols] = find(A);
[Brows, Bcols] = find(B);
AUcols = unique(Acols);
BUcols = unique(Bcols);
matches = bsxfun(@(I,J) isequal(A(:,I), B(:,J)), AUcols(:), BUcols(:).');
[matches_rows, matches_cols] = find(matches);
C = sparse(AUcols(matches_rows), BUcols(matches_cols), 1);

3 Comments

thank you for your help.
but when I run this code I get: "Error using bsxfun Invalid output dimensions".
[Arows, Acols] = find(A);
[Brows, Bcols] = find(B);
AUcols = unique(Acols);
BUcols = unique(Bcols);
[GridA, GridB] = ndgrid(AUcols, BUcols);
matches = arrayfun(@(I,J) isequal(A(:,I), B(:,J)), GridA, GridB);
[matches_rows, matches_cols] = find(matches);
C = sparse(AUcols(matches_rows), BUcols(matches_cols), 1);
I tried this code for several examples. For this input: A=[1 1 0 0;1 0 1 0], B=[0 1 0 1;0 0 1 1] the output is:
ans =
0 0 0 1
0 1 0 0
0 0 1 0
But its supposed to be a 4x4 matrix. The zero column of A is not treated. Can this be fixed?

Sign in to comment.

Categories

Asked:

on 21 Jan 2016

Edited:

on 22 Jan 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!