Vectorized solution to replace for loops in nested combinations

I currently have a Matlab program which uses for loops and I would like to vectorize as much of it as possible to increase performance (the actual calcultion in the innermost loop could benefit from vectorization, I hope... it involves multiple arithmetic and elementwise operations aggregated by a sum of sum of sums...). The loops are, in the specific case N=2:
for t=1:T, for t1=0:t, for m1=0:t1, for m2=0:t-t1
. In other words, there is a variable t which goes from 1 to T and I am interested, for each value of t, in getting a partition t1+t2=t and selecting all possible 0<=m1<=t1 and 0<=m2<=t2. I was wondering if there was a way to generate all these combinations of t1, t2, m1 and m2, but in general for N>2, say N=5 for example?
One approach would be to use ndgrid followed by a selection of those variables which actually fit the criteria; unfortunately the size of resulting matrices quickly becomes unwieldy. Also, this is overkill since most portions of those matrices don't fit the criteria. So although the brute force approach is highly combinatorial, the final result is not so highly combinatorial that it becomes impossible to manage in my situation (say T about 50 and N between 5 and 10, depending on what the actual numbers are I might have to adapt so that this all fits in memory...).
Here is what the non-vectorized code could look like for N=2 and N=3:
function [ combisN2, combisN3 ] = testCombis()
T = 7; % just so this goes faster, but ideally I want to go to T=50, or perhaps more...
%% loops for N=2
N = 2;
nb2 = zeros( T, 1 );
for t = 1:T
% parfor t = 1:T % toggle comments with previous line to use parallel loops
for t1 = 0:t
t2 = t - t1;
for m1 = 0:t1
for m2 = 0:t2
nb2( t ) = nb2( t ) + 1;
end
end
end
end
data2 = cell( T, 1 );
for t = 1:T
% parfor t = 1:T % toggle comments with previous line to use parallel loops
counter2 = 0;
data2{ t, 1 } = uint8( nan( nb2( t ), N * 2 ) );
for t1 = 0:t
t2 = t - t1;
for m1 = 0:t1
for m2 = 0:t2
counter2 = counter2 + 1;
data2{ t, 1 }( counter2, : ) = [ t1, m1, t2, m2 ];
end
end
end
end
combisN2 = cell2mat( data2 );
save( 'combisN2.mat', 'combisN2' );
%% loops for N=3
N = 3;
nb3 = zeros( T, 1 );
for t = 1:T
% parfor t = 1:T % toggle comments with previous line to use parallel loops
for t1 = 0:t
for t2 = 0:t-t1
t3 = t - t1 - t2;
for m1 = 0:t1
for m2 = 0:t2
for m3 = 0:t3
nb3( t ) = nb3( t ) + 1;
end
end
end
end
end
end
data3 = cell( T, 1 );
for t = 1:T
% parfor t = 1:T % toggle comments with previous line to use parallel loops
data3{ t, 1 } = uint8( nan( nb3( t ), N * 2 ) );
counter3 = 0;
for t1 = 0:t
for t2 = 0:t-t1
t3 = t - t1 - t2;
for m1 = 0:t1
for m2 = 0:t2
for m3 = 0:t3
counter3 = counter3 + 1;
data3{ t, 1 }( counter3, : ) = [ t1, m1, t2, m2, t3, m3 ];
end
end
end
end
end
end
combisN3 = cell2mat( data3 );
save( 'combisN3.mat', 'combisN3' );
end
The code returns the list of possible values taken by t and m for all indices 1:N in columns. The limitation here will likely be on the number of rows such a matrix can hold in memory.
(Side note: the code above assumes parallel for loops in its design, although the line itself is commented to enable execution on your end if you do not have the parallel toolbox; if it were only for non-parallel calculation, the structure I would have chosen for the data would likely have been different, for example avoiding cell arrays as they currently are... Also the code was originally meant for calculating the size of the resulting matrix, and I decided to simply add the actual desired result as well. I am not so much looking for comments on how to improve the for loops as I am to avoid them completely and wish to obtain a vectorized solution which provides the same results.)

1 Comment

I forgot to mention why such a program would be useful, so I am adding this bit of info here in case someone looks it up: these combinations represent the universe, for each t=0:T, of a multinomial distribution with t trials and N categories.

Sign in to comment.

 Accepted Answer

Well, you can at least reduce it to 2 loops:
T=50;
nb2 = zeros( T, 1 );
tic
for t = 1:T
t1=0:t;
for i=1:numel(t1)
[M1,M2]=ndgrid(0:t1(i), 0:(t-t1(i)) );%all M1,M2 combinations for fixed t,t1
nb2( t ) = nb2( t ) + numel(M1);
end
end
toc
Elapsed time is 0.025991 seconds.
nb2
nb2 = 50×1
4 10 20 35 56 84 120 165 220 286

14 Comments

And then 1 loop:
T=50;
nb2 = zeros( T, 1 );
[I,J]=find(triu(ones(T+1,T),-1));
I=I-1;
tic;
for k=1:numel(I)
t1=I(k); t=J(k);
[M1,M2]=ndgrid(0:t1, 0:(t-t1) );%all M1,M2 combinations for fixed t,t1
nb2( t ) = nb2( t ) + numel(M1);
end
toc
Elapsed time is 0.030924 seconds.
nb2,
nb2 = 50×1
4 10 20 35 56 84 120 165 220 286
Thanks. Looking at this from my cellphone so please pardon me if I am missing something, but at a glimpse I have a feeling this does not generalize for N? I will have use cases for N in the range from 3 to (hopefully if it fits in memory) 10… I can see that the ndgrid will generalize for m, but how about t?
I don't know how much memory your output will consume but the loop scheme can be generalized to N>2 using this FEX download
T=50;
N=3;
tic
for t = 1:T
P=allVL1(N,t); %partitions of t
np=size(P,1);
for i=1:np
mranges=arrayfun(@(i)0:i, P(i,:),'un',0);
[M{1:N}]=ndgrid(mranges{:});%all M1,M2,...MN combinations for current partition
end
end
toc
Elapsed time is 0.798821 seconds.
In fact, I suspect that
combos=allVL1(N,T,"<=");
will generate all the combinations you are considering in just one line.
I am looking at allVL1: at first glimpse, the doc says it is recursive, which is pretty cool (I was struggling with a recursive form), so yes it seems that would answer => I will do some tests and keep you posted. I also saw you wrote something about a diophantine function which seemed promising for the generation of the t's, and two loops (one for t and one for N) would also be satisfactory. I cannot find a diophantine function and am not sure what that does, was that a false start or does it actually lead somewhere?
I am looking at allVL1: at first glimpse, the doc says it is recursive,
It only uses a recursive algorithm if you request a subset of all combinations. Otherwise, it uses a non-recursive, but apparently loop-free, method.
I notice now that it will also return the total number of combinations if you set the 4th input argument MaxNbSol=NaN. So that will be a way for you to pre-determine whether the result will be RAM-feasible. It looks like N=10, T=50 will be quite infeasible.
>> allVL1(10,50,"<=",nan)
ans =
7.5394e+10
allVL1 actually returns the t's (t1, t2, ...), so that's already something. Then perhaps use your inner for loop for the m's... so similar to your second comment I guess... I do not see it solving the problem in 1 go (which is no problem)
regarding your point about RAM, I am not sure how that would work adding the m's since so far it only does the t's, but note that for the range I am interested in we can divide the amount of RAM used by 8 (I think) using uint8.
regarding your point about RAM, I am not sure how that would work adding the m's since so far it only does the t's
It will be even worse. Since all combinations of the t_i's alone consume 750 GB when N=10,T=50, computing and saving the m's on top of that will multiply the RAM requirements still further.
It will be even worse.
Obviously :) what I was pointing at is that there is a way to make the t's fit for T=50 and N=10 using that trick. The main problem I will hit is for the m's, which are bigger. If I reevaluate my needs to go up to N=6 then that would be about 18x1E9 for m's and 32x1E6 for t's, so well under 20Gb if I use uint8. That being said, it is unclear what would be an efficient way to combine the m's with the t's into a matrix as what is done in my original for loops... i.e. in order to generate the desired combiN result from my code (the one saved to the .mat file).
It probably needs to be re-evaluated whether you really need all these combinations simultaneously in memory, especially if you are just using them for summations as you allude to in your OP.
putting aside the memory constraints for now, it is still unclear how to combine the independently created matrices for t's and m's.
Something like this, I would guess.
T=7;
N=2;
tcombos=num2cell(allVL1(N,T,'<='));
nn=size(tcombos,1);
result=cell(nn,1);
for i=1:nn
mranges=cellfun(@(j)0:j, tcombos(i,:),'un',0);
arg=[tcombos(i,:);mranges];
[d{1:2*N}]=ndgrid(arg{:});
result{i}=reshape( cat(2*N+1,d{:}) ,[],2*N);
end
result=array2table( vertcat(result{:}),'Var', ["t";"m"]+(1:N))
result = 330×4 table
t1 m1 t2 m2 __ __ __ __ 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 2 0 0 0 2 1 0 0 2 2 1 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 2 0 0 0 2 1 0 0 2 2 0 0 0 0 3 0
This works (though I do not know as of now if it improves performance vs for loops, it definitely enables generalizing in a way my loops did not). Thanks!

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Asked:

on 9 Feb 2023

Commented:

on 10 Feb 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!