Without resorting to FEX/MEX, can the overhead of repeated calls to sparse() be avoided with operations on I, J, V directly?

3 views (last 30 days)
I have a time-marching problem bottlenecked by matrix multiplication , where the sparse matrix is the sum of many time-independent sparse matrices with time-dependent scalar coefficients:
I'm aware of FEX solutions that would enable a 3D to be defined once prior to time-marching, though I would prefer to rule out native MATLAB first. My current thinking is to supplement the I, J, V vectors with an index vector K, and do one of the following:
  1. Generate a cell array of sparse matrices along index K, with multiplied with bsxfun, similar to this. (Probably the slowest option)
  2. Sparse matrix construction with B=sparse(I, J, V.*beta(K), N, N)*X
  3. Avoid sparse matrix entirely with B=accumarray(I, V.*beta(K).*X(J), N, N)
I haven't tested option 1, but option 3 with accumarray runs about 50x slower than option 2 (with or without issparse option). This makes sense, as the multiplication with X in accumarray is done before accumulation, hence on a much larger array.
However, option 2 still appears to be wasting performance. For example, sparse(I, J, V, N, N)*X runs about 50x slower than basic A*X. My initial interpretation of this is that much of the time in sparse is spent on unavoidable summation of common terms in I,J,V. I cant profile sparse internally, but tested this by accumulating V prior to sparse(I, J, V, N, N)*X, which comes out to ~7x slower than A*X.
My conclusion is that there is still some significant & (potentially) avoidable overhead from calling sparse many times unrelated to summation of duplicate indices, and the larger piece of summation that is unavoidable due to the summation of time-dependent terms co-located in .
To boil this down, I have two questions I cannot answer:
  1. Considering the pattern of summation of duplicate indices is always the same (static I/J), can the summation overhead in sparse be improved?
  2. What is the source of the remaining overhead in sparse and can it be avoided?
Bruno Luong
Bruno Luong on 7 Aug 2022
@James Tursa yes you get it right. However
  1. In the range of density/size and N_k (m in your notation) your method is still 50% slower than the accumarray solution we come up with,
  2. The runtime scale with N_k and is not bounded by the assumption of limited "bandwidth" (density of A0) stated by @Joel Lynch. It is probably an important requirement for his application.

Sign in to comment.

Accepted Answer

Bruno Luong
Bruno Luong on 7 Aug 2022
Edited: Bruno Luong on 7 Aug 2022
EDIT: Here is the final code. Method used in B2 is the fatest;
M = 10000;
N = 10000;
N_k = 100;
A_k = cell(1,N_k);
for k=1:N_k
A_k{k} = sprand(M,N,0.0005);
N_k = length(A_k);
[M,N] = size(A_k{1});
beta = rand(N_k,1);
X = rand(N,1);
I = cell(1,N_k);
J = cell(1,N_k);
V = cell(1,N_k);
K = cell(1,N_k);
for k=1:N_k
[I{k},J{k},V{k}] = find(A_k{k});
K{k} = k + zeros(size(I{k}));
I = cat(1,I{:});
J = cat(1,J{:});
K = cat(1,K{:});
V = cat(1,V{:});
[IJ,~,ic] = unique([I,J],'rows');
I2 = IJ(:,1);
J2 = IJ(:,2);
Y = sparse(ic, K, V, size(IJ,1), N_k);
B1=sparse(I, J, V.*beta(K), M, N)*X ;
Elapsed time is 0.252217 seconds.
B3 = sparse(I2,J2,Y*beta,M,N)*X;
Elapsed time is 0.195761 seconds.
B2 = accumarray(I2,(Y*beta).*X(J2),[M,1]);
Elapsed time is 0.096573 seconds.
Joel Lynch
Joel Lynch on 7 Aug 2022
Yeah, at this point I'm just documenting for future viewers. Feel free to post this up top, I cleaned some things up and added comments. Also added integer type casting to reduced memory.
M = 10000; % A_0/A_k rows
N = 10000; % A_0/A_k columns
N_k = 100; % Number of A_k matricies
% Random beta/X
beta = rand(N_k,1);
X = rand(N,1);
% Pick smallest integer type for indices
int_type = 'uint64';
if max(N,M) < intmax("uint16")
int_type = 'uint16';
elseif max(N,M) < intmax("uint32")
int_type = 'uint32';
% Generate A_k matricies in single stream
I = []; J=[]; K=[]; V=[];
bandwith = round(M*0.25); % Controls final density of A_0
A_k_ref_density = 1e-4; % density of each A_k
for k=1:N_k
A_k = sprand(M, N, A_k_ref_density);
[Inew, Jnew] = find(A_k);
ind = abs(Inew-Jnew)<bandwith; % Keep i/j's within band
I = [I; cast(Inew(ind), int_type)];
J = [J; cast(Jnew(ind), int_type)];
V = [V; A_k(ind)];
K = [K; repmat(cast(k, int_type), sum(ind), 1)];
% spy(sparse(I,J,K, M,N)) % Spy A_k
%spy(sparse(I,J,K, M,N)) % Spy Final A_0 for non-zero Beta's
% Smaller vector of any non-zero i/j's in all A_k's
% numel(ic) = size([I, J], 1)
% ic points to the location in the reduced/unique array of [I, J] pairs
% (i.e. ic holds the row index where each nonzero I/J in A_0 resides)
[IJ, ~, ic] = unique([I, J], 'rows', 'stable');
Iu = IJ(:,1); % unique I in [I, J]
Ju = IJ(:,2); % unique J in [I, J]
Nrow_u = size(IJ, 1);
% Time-Independent Y matrix, such that Y*beta is a column vector of
% non-zero elements in A_0, with I/J defined by Iu, Ju
Y = sparse(ic, double(K), V, Nrow_u, N_k);
% Native approach, construct A_0 within time loop, then A*x
t1 = timeit(@() sparse(I, J, V.*beta(K), M, N)*X)
% Construct sparse, but with the reduced Y
t2 = timeit(@() sparse(Iu, Ju, Y*beta, M, N)*X)
% Optimal Approach, accumarray which avoids sparse()
t3 = timeit(@() accumarray(Iu, (Y*beta).*X(Ju), [M,1]))

Sign in to comment.

More Answers (0)




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!