Vectorizing or otherwise accelerating nested loops with large multidimensional arrays
2 views (last 30 days)
Show older comments
I have a dataset consisting of large files of neural recordings. Each file contains two 3-dimensional arrays. The first array, A, contains raw numerical data and is organized as follows: 1st dimension: N rows (corresponding to time-varying voltage; typically hundreds of thousands to millions of rows); 2nd dimension: 32 columns (corresponding to electrode channels); and 3rd dimension: 50 elements ("pages," as it were, corresponding to different frequency bands).
The second array B, has the same dimensions as A. It is organized as follows: 1st dimension: numerically categorized data points (integer values of range: 1:20); 2nd dimension: 32 elements (corresponding to electrode channels); 3rd dimension: 50 elements (corresponding to frequency bands).
I need to index and operate over both the 2nd and the 3rd dimensions of the array A. Specifically, for each "page" of array B (i.e., each of the 50 pages, or frequencies, in the 3rd dimension), I need to do the following:
(i) locate the indices of each unique value of the first dimension (1-20),
(ii) apply these indices (separately) to each page (3rd dim) of array A, and
(iii) compute the column-wise means dim 1 in array A corresponding to the indices generated from B.
In other words, I need to find the indices of the first numerical category in dimension 1 of array B (i.e., all number 1's), apply those indices to each page of array A's third dimension, and then take the columnwise mean of the index values per page. This operation will result in a 1x32x50 array. Subsequently, the operation is to be repeated for categories 2-20 using the first page of array B's third dimension. The process is then repeated in the same manner for all 50 pages in B. At the end of the operation, the result will have the dimensions of 20x32x50x50.
An inelegant solution to this problem (example code below) involves nested 20- and 50-element loops. But this approach is extremely slow given the large datasets. Is there is a way to vectorize or otherwise accelerate these operations with parallelization or cluster operation? Thanks!
A = rawDat; % for example, 600000x32x50
B = categoricalDat; % for example, 600000x32x50
%
for idx = 1:50 % corresponds to dimension 3 of A
%
for idx2 = 1:20 % corresponds to 'categories' in dimension 1 of array B
%
% create an index array, C, corresponding to the indices of a given
% categorical value from dimension 1 of array B:
C = B == idx2;
%
% expand C to dimensions consistent with A:
C = repmat(C(:,:,idx2),[1,1,50]);
%
% set unwanted indices = NaN (rather than 0) so as to preseve
% dimensions and enable appropriate averaging:
A(~C) = NaN;
%
% compute columwise means
meanAmp(idx2,:,:,idx) = nanmean(A);
%
clear C
end
%
end
9 Comments
Jan
on 19 Aug 2022
Edited: Jan
on 19 Aug 2022
@Jacob McPherson: I thought the accumulation of the NaNs happens intentionally. If not, setting values of A to NaN only to apply a nanmean() is not efficient:
X = rand(1e3, 1e3);
M = rand(size(X)) < 0.3; % A logical mask
tic;
for k = 1:1e2
Y = X;
Y(M) = NaN;
Z1 = nanmean(Y);
end
toc
tic;
for k = 1:1e2
Y = X;
Y(M) = 0;
Z2 = sum(Y, 1) ./ (size(M, 1) - sum(M, 1));
end
toc
isequal(Z1, Z2)
(size(M, 1)-sum(M, 1)) is slightly faster than sum(~M, 1) for large M.
nanmean is deprecated, but the replacement mean(X, 'omitnan') is slower.
Answers (1)
Bruno Luong
on 18 Aug 2022
% Generate dummy data
B=randi(4,10,3,5);
A=rand(10,3,5);
meanAmp = nan([max(B,[],'all'),size(A,2),size(A,3),size(A,3)]);
for idx = 1:size(A,3)
for idx2= 1:max(B,[],'all')
C = B(:,:,idx) == idx2;
C = repmat(C,[1,1,size(A,3)]);
Atmp = A;
Atmp(~C) = NaN;
meanAmp(idx2,:,:,idx) = mean(Atmp,'omitnan');
end
end
[m,n,p] = size(B);
I = repmat(reshape(B,[m,n,1,p]),[1,1,p,1]);
J = repmat(1:n,[m,1,p,p]);
K = repmat(reshape(1:p,[1,1,p,1]),[m,n,1,p]);
L = repmat(reshape(1:p,[1,1,1,p]),[m,n,p,1]);
AA = repmat(A,[1 1 1 p]);
IDX = [I(:),J(:),K(:),L(:)];
S = accumarray(IDX,AA(:));
N = accumarray(IDX,1);
mA = S ./ N;
% Check correctness
b = isfinite(meanAmp);
norm(meanAmp(b)-mA(b),'Inf')
2 Comments
Bruno Luong
on 18 Aug 2022
Edited: Bruno Luong
on 18 Aug 2022
You can always change a little bit the method and loop on 4th dimension.
[m,n,p] = size(B);
q = max(B,[],'all');
mA = zeros([q,n,p,p]);
[J,K] = ndgrid(uint16(1:n),uint16(1:p));
JK = repelem([J(:) K(:)],m,1);
for l = 1:p
I = repmat(uint16(B(:,:,l)),[1,1,p]);
IDX = [I(:),JK];
S = accumarray(IDX,A(:));
N = accumarray(IDX,1);
mA(:,:,:,l) = S ./ N;
end
See Also
Categories
Find more on Data Type Identification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!