# How to I find X unique combinations (order doesn't matter) of X numbers within an array of X numbers.

14 views (last 30 days)
Dc215905 on 29 Aug 2020
Edited: Dc215905 on 30 Aug 2020
The question is in the title.
Thanks!
Rik on 29 Aug 2020
@Bruno, we seem to have a different vision on flags. Can you weigh in on this thread?

Bruno Luong on 29 Aug 2020
You might just miss the feature of NCHOOSEK that can accept the vector array as input
>> X=rand(1,5)
X =
0.3566 0.3205 0.8137 0.1779 0.9431
>> nchoosek(X,3)
ans =
0.3566 0.3205 0.8137
0.3566 0.3205 0.1779
0.3566 0.3205 0.9431
0.3566 0.8137 0.1779
0.3566 0.8137 0.9431
0.3566 0.1779 0.9431
0.3205 0.8137 0.1779
0.3205 0.8137 0.9431
0.3205 0.1779 0.9431
0.8137 0.1779 0.9431
##### 2 CommentsShowHide 1 older comment
Dc215905 on 30 Aug 2020
One additional follow-up question. Suppose I don't want to find every combinations, but maybe 10,000 of the possible 3 million. My computer can't often handle stroing every combination so I just want a subset. Any thoughts on how to do this? There's a comment below that I think might help with this (use rand and unique sort), but I wasn't sure if there is something I'm missing again with nchoosek.
After learning more terminology - it appears I'm trying to perform a pseudo delete-d jacknife, in that I don't intend on analyzing every combination, just a portion of it.

the cyclist on 29 Aug 2020
Edited: the cyclist on 29 Aug 2020
There's probably a more efficient way, since I effectively first find all combinations, but I think this does what you want, in an admittedly obfuscated method.
M = 10;
N = 5;
% Get *all* combinations of whether each element should be selected,
% where 1 means yes, and 0 means no.
allComboIndex = dec2bin(0:2^M-1)' - '0';
% Reduce the above set to only columns with N elements selected
fiveComboIndex = allComboIndex(:,sum(allComboIndex)==N);
% Find where the 1's are located.
[r,~] = find(fiveComboIndex);
% Reshape to get the N values of each column
fiveCombos = reshape(r,N,[]);
the cyclist on 29 Aug 2020
On my machine, this worked for M up to 24, taking a few seconds. For M larger than that, the intermediate array is too big, and MATLAB bogs down.

John D'Errico on 29 Aug 2020
Youtr title is still far too vague to provide an answer, but the comment you made may be sufficient.
You have lets say 10 numbers, 1:10. Apparently your real problem may be much larger. (EXPLAIN THESE THINGS! Don't feel you need to put it all in the title, and be as short and sweet as you can. The question can be quite long, and that is why you have more than jjust a title given to you to use.)
Now you wish to select some subset of those 10 numbers, here in blocks of 5. The numbers themselves are references to another array, and will be used to find the average of the corresponding elements. So a simple increasing order of each subset is sufficient. Your original question was to find 1000 such subsets.
>> nchoosek(10,5)
ans =
252
And of course, nchoosek tells us this is impossible. THere are exactly 252 subsets. In fact, nchoosek gives us the complete set.
>> nchoosek(1:10,5)
ans =
1 2 3 4 5
1 2 3 4 6
1 2 3 4 7
1 2 3 4 8
1 2 3 4 9
1 2 3 4 10
1 2 3 5 6
1 2 3 5 7
1 2 3 5 8
1 2 3 5 9
1 2 3 5 10
1 2 3 6 7
...
I've just pasted in the first 11 such subsets, but there will be 252 of them as generated.
Now, suppose I wanted to use these to find averages of some arrays of numbers? First, get used to storing your array in multi-dimensional arrays.
Data = rand(2,3,10);
So here, I might have 10 arrays, each of size 2x3. Now I will decide to select all 252 possible subsets of tose arrays, where the subsets are of size 5, and form the average of those arrays.
subsets = nchoosek(1:10,5);
Datasubs = reshape(Data(:,:,subsets),[2,3,nchoosek(10,5),5]);
size(Datasubs)
ans =
2 3 252 5
Dataave = mean(Datasubs,4);
size(Dataave)
ans =
2 3 252
So I took all 252 of the possible subsets. I created the set of arrays you wanted. And then I formed the averages. The result is now 252 arrays. Here is one of them:
Dataave(:,:,1)
ans =
0.606133417842019 0.633298374281431 0.348760616827367
0.605002524439109 0.570316578527122 0.392097473426195
So you can easily do what you wanted, within limits. But we still don't know the real problem. Because apparently you are trying to do this with much larger sets of numbers. And for that, of course things will fail if you go too large.
For example, suppose you ask to find all unique sets of numbers of size 50 in subsets of size 25, of course things will fail miserably!
nchoosek(50,25)
ans =
126410606437752
You need to understand the limits of your computer and the memory you have, and the time you are willing to invest in allowing your computer to perform some huge thing. Even a smaller problem will become pretty large, pretty fast.
nchoosek(100,5)
ans =
75287520
Suppose for example, I had 20 such arrays to average, and I wish to find the average of blocks of 5 elements. Again, your question is far ttoo vague. But now your goal will be to find 1000 such sets, chosen randomly. But you wish to avoid replication.
subs = randi(20,[2000 5]);
subs = unique(sort(subs,2),'rows');
size(subs)
ans =
1950 5
subs = subs(randperm(size(subs,1),1000),:);
So I selected integers from 1:20. 2000 sets of 5 of them. Then I sorted them, and used unique to toss the replicates.
Finally, I selected 1000 of those sets randomly.
Essentially, you just need to learn to use MATLAB. At the same time, you need to understand the limits of your computer.
Dc215905 on 30 Aug 2020
Thank you! This is all very good. I will certainly be using a lot of this in the future. I especially like the bit about the unique sort - that will likely help with my nchoosek memory issue. I don't need every possible combination - just 10,000 of the the possible 3 million for example.
After learning more terminology - it appears I'm trying to perform a pseudo delete-d jacknife, in that I don't intend on analyzing every combination, just a portion of it.

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!