How can I make this code run faster with a sparse matrix

20 views (last 30 days)
Assuming I have the following code
x=[1:30164675]
y=x
N=length(x)
z=sparse(N,N)
for i=1:N
for j=1:N
if x(i)~=y(j)
z(i,j)=1
end
end
end
z
For my real 'x' data, I have 30164675 values, alot of which a equal, so I should have more zero entries in the matrix except from those on the diagonal. But when I run this code, it takes too long. Am I using the sparse matrix correctly or is there any way to use vectorization and a sparse matrix to make this code faster?

Accepted Answer

John D'Errico
John D'Errico on 2 Mar 2019
Edited: John D'Errico on 2 Mar 2019
How can I say this? You are using sparse in the very worst way possible. Well, you are.
Not only that, but your code will dump HUGE amounts of stuff into the command window at every iteration. So you very, very much need to learn when to terminate your lines of code with a semi-colon.
A simple rule about sparse is you never build such a matrix one element at a time. That forces MATLAB to stuff one new element into the matrix each time through the loop. It is not so inefficient if the matrix is a full one, since MATLAB does not need to shuffle elements around in memory at ach iteration. But a sparse matrix is stored differently, not storing the zeros. So single element insertion into a sparse matrix is terribly slow. NEVER do it that way.
One caveat is that if you used spalloc instead of sparse, and inserted the elements into the array in the proper way, you could reduce the time somewhat. But that would requires you to have a very good knowledge of sparse storage, and it would still be inefficient.
Instead, you need to create a list of the non-zero elements of the sparse matrix in advance., THEN call sparse ONCE with that entire list. (Read the help for sparse. Then read it again. I recall being confused the first time I read it.)
As far as how to find all of the elements in your list that are duplicates, I'd probably start with unique or ismember. The simplest way of course to create the matrix you have is this:
X = randi(10000,[1,20000]); % a random vector
Xs = sparse(X);
A = Xs' == Xs;
nnz(A)
ans =
60114
whos A
Name Size Bytes Class Attributes
A 20000x20000 880062 logical sparse
So I started with a set of 20000 integers, no larger than 10000. That tells me there will be 2 duplicates on average for each. As you can see, it creates a sparse logical matrix of the desired size.
min(sum(A,1))
ans =
(1,1) 1
max(sum(A,1))
ans =
(1,1) 9
The above code will work as long as you have release R2016b or later. Earlier releases would do it like this:
A = bsxfun(@eq,Xs',Xs);
The trick is to make the vector X sparse, so then the result will also be sparse. This works even though the vector is not truly a sparse one at all, but the final result needs to be so.
Could I have built that matrix using unique or ismember? Probably. But the direct test Xs'==Xs is the solution I would use. And I would NEVER, EVER solve the problem as you tried.
  2 Comments
John D'Errico
John D'Errico on 2 Mar 2019
Using sparse matrices forces you to think a different way about how to build them. The first time I did (in fortran, circa 1990, I think) I had a tough time. It took too long to build them, or so I thought, because I was not building them correctly.
So you need to use tools like spdiags and sparse to build them in one call. Once you get used to doing so, you will find sparse matrices can be hugely valuable.

Sign in to comment.

More Answers (0)

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!