Print all possible matrix combinations for a given matrix
Show older comments
Hi everybody, i am going mad at trying to solve this problem:
For a given input vector P, for example in my case
P=[4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5]
i create a matrix that HAS TO HAVE 3 columns and contain all P elements (order isn't important) and zero in the free places, for example in my case
G =
4.5000 4.5000 5.2500
5.2500 2.2500 0.6000
0.5000 0.6000 0.5000
0.3000 1.0000 2.0000
0.5000 0.5000 0
my goal is to swap matrix elements between columns to reach the disposition that satisfies this property: say that the sum of all elements in vector P is S.
The disposition should satisfy the property that the sum of all elements in column 1 must be as close as possible to S/3 and the same for column 2 and column 3.
my first attempt was trying to generate all possible matrix disposition or permutation an find this memorizing the matrix that get closer to my goal by while cycle with this condition
the problem is that using perms(G) for generating it unfortunately I get out of memory because perms generates ALL possible permutation but i need only combination of n element i 3 groups that is much less than all possible combination but i don't know how to create it
Example of one possible output found by hand (order in elements of the same column doesn't matter):
G =
4.5000 5.2500 5.2500
4.5000 2.2500 0.6000
0.5000 0.6000 1.0000
0.3000 0.5000 2.0000
0 0.5000 0.5000
sum_colon1=9.8
sum_colon2=9.1
sum_colon3=9.35
eps_colon1=0.383
eps_colon2=0.316
eps_colon3=0.066
minimum=0.766
6 Comments
Guillaume
on 8 Apr 2019
I don't get why eps_colon1 is 0.134 in your example. S/3 is 9.4167 which gives an eps of 0.38333
Mattia Deriu
on 8 Apr 2019
Mattia Deriu
on 8 Apr 2019
So basically, you're doing a montecarlo simulation. I'm not sure why you're only swapping two elements at a time. You may as well use a random permutation each time.
Simplification of your code, which probably doesn't affect its slowness since it's all down to the number of iterations:
P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
phase = 3;
itermax = 1e9;
P = [P; zeros(-mod(numel(P), phase), 1)];
Gmin = reshape(P, [], phase);
S = sum(P);
ERROR1 = sum(abs(sum(Gmin, 1) - S/3);
for i = 1:itermax
G = reshape(P(randperm(numel(P)), [], phase));
ERROR2 = sum(abs(sum(G, 1) - S/3));
if ERROR2 < ERROR1
ERROR2 = ERROR1
Gmin = G;
end
end
What sort of speed are you expecting. And also what sort of problem size are you expecting to solve? My answer below works for your size but will quickly become unmanageable for larger problems.
Mattia Deriu
on 8 Apr 2019
Mattia Deriu
on 10 Apr 2019
Accepted Answer
More Answers (1)
John D'Errico
on 8 Apr 2019
Edited: John D'Errico
on 8 Apr 2019
You are going mad because ...
- Your problem is poorly formulated. i.e., how close to S/3 do you need to be?
- For long vectors, your problem really is large, especially as you are trying to solve it.
For a vector of length n, there are 2^n subsets of elements of the vector. This is well known, but 2^n grows exponentially. Note that perms is not appropriate here, so at least you don't have factorial growth, but it is still big.
P=[4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
numel(P)
ans =
14
Given the vector P of length 14, there are 2^14 subsets of elements to consider. We can think of them as the bits:
B14 = dec2bin(0:2^14-1);
>> B14(1:10,:)
ans =
10×14 char array
'00000000000000'
'00000000000001'
'00000000000010'
'00000000000011'
'00000000000100'
'00000000000101'
'00000000000110'
'00000000000111'
'00000000001000'
'00000000001001'
That was for the first 10 subsets we might choose. So each element of P is either included or excluded in a sum, as indicated by the corresponding bit.
(I'll discuss later the issue that it looks like you want at MOST 3 terms in the final sum.)
Now, your goal is essentially to find all such sums. In the end, you probably want to rank the sums, sorting them such that those with a sum closes to S/3 are at the top. But since you never say how close to S/3 you want to allow the sums to go, you effectively need to compute them all. This is not even remotely difficult for a vector of length only 14 though.
P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
n = numel(P);
S = sum(P);
binP = dec2bin(0:2^n-1) - '0';
allsums = binP*P;
[err,tags] = sort(abs(allsums - S/3),'ascend');
allsums = allsums(tags);
Here, the target sum is:
S/3
ans =
9.41666666666667
It turns out there are 57 distinct ways to form a sum of 9.4, but we can clearly never acheive 9.41666666666667.
allsums(1:100)'
ans =
Columns 1 through 11
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 12 through 22
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 23 through 33
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 34 through 44
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 45 through 55
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 56 through 66
9.4 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45
Columns 67 through 77
9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45
Columns 78 through 88
9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45
Columns 89 through 99
9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35
Column 100
9.35
What were the 10 first such sums that were found? Represented bya 1 here where a given term was included in the sum, we see:
binP(tags(1:10),:)
ans =
0 0 0 1 1 0 0 1 0 1 0 0 1 1
0 0 0 1 1 0 0 1 0 1 1 0 0 0
0 0 0 1 1 0 0 1 1 1 0 0 0 1
0 0 0 1 1 0 0 1 1 1 0 0 1 0
0 0 0 1 1 0 1 1 0 1 0 0 0 1
0 0 0 1 1 0 1 1 0 1 0 0 1 0
0 0 0 1 1 0 1 1 1 1 0 0 0 0
0 0 0 1 1 1 0 0 0 1 0 0 1 1
0 0 0 1 1 1 0 0 0 1 1 0 0 0
0 0 0 1 1 1 0 0 1 1 0 0 0 1
We can extract the specific terms in one such subset sum as:
binP(tags(1),:).*P'
ans =
Columns 1 through 11
0 0 0 5.25 2.25 0 0 0.6 0 0.3 0
Columns 12 through 14
0 0.5 0.5
So, was this all easy? Well, yes. Trivially so, at least for a vector of length only 14. I could have succeeded up to around 20 elements in the vector P. If you increase the length of P by a significant amount, things will still go to hell. So you can effectively never generate all such sums for a vector of length 50. (At least not with the computers we have available today.) Even that operation for partial sums of 25 terms or so will be brutal.
Remember that we computed ALL such partial sums, because we really don't know when to stop looking at all possible subsets. How far away from S/3 are we allowed to go?
allsums(tags(end))
ans =
28.25
>> S
S =
28.25
>> binP(tags(end),:)
ans =
1 1 1 1 1 1 1 1 1 1 1 1 1 1
So one of those candidate sums was the sum of all elements, and since none of the elements of P were negative, that sum will lie way out in the weeds.
For large, long vectors, is there a better way? Well yes. You could write an algorithm that recursively generated subsets of P, then used a tolerance to decide if a sum was too far away. But even this can be very difficult to perform, because you may not be able to know how to generate that tolerance. This could get difficult. Doable. But sometimes difficult.
Now, suppose we want to solve the problem of finding all subsets of at most 3 terms in any candidate sum? This is most easily done using nchoosek. So, I might split the problem into finding all partial sums of 1, 2, or 3 terms, then combining them at the end.
S = sum(P);
Starget = S/3;
Stol = 0.2; % a tolerance on the sum error
T3 = nchoosek(P,3);
err = abs(sum(T3,2) - Starget);
T3(err > Stol,:) = [];
err(err> Stol) = [];
[err,tags] = sort(err,'ascend');
T3 = T3(tags,:);
This is now trivially easy to write, as you see. And it is efficient, since we only cared about at most 3 terms in the partial sums. In fact, there were only 9 partial sums of 3 terms that were within a tolerance of 0.2 of the target sum.
T3
T3 =
4.5 4.5 0.5
4.5 4.5 0.5
4.5 4.5 0.5
4.5 4.5 0.5
5.25 2.25 2
5.25 2.25 2
4.5 4.5 0.3
4.5 4.5 0.6
4.5 4.5 0.6
Now you could do the same set of operations for exactly 2 terms in the sum, then look at 1 term. Each time, keep only the partial sets that were within the designated tolerance on the sum.
The nice thing is, even for significantly longer vectors P, if you are never interested of partial sums of never more than 3 terms, this is still doable. Thus, for vectors of length 50 or 100,
nchoosek(50,3)
ans =
19600
>> nchoosek(100,3)
ans =
161700
We still only need to consider a few tens of thousands of combinations, or even 100K such partial sets. It will still get nasty for vectors with many thousands of elements to be considered, but you can always overwhelm any computational scheme if you try hard enough..
So don't go mad. Just use the proper tools.
8 Comments
Mattia Deriu
on 8 Apr 2019
Guillaume
on 8 Apr 2019
I understood the problem as partitioning P into 3 subsets of equal length (with one subset having a 0 element added), then finding which of these partition has minimal distance to S/3, with distance being defined as the sum of the absolute difference of the sum of each subset with S/3).
Mattia Deriu
on 8 Apr 2019
Guillaume
on 8 Apr 2019
I'm not sure what No is referring to. I understood your problem as:
- find all distinct partitions of P into 3 subsets (columns) of length 5
- for each partition get the subset (column) sum
- find the partition for which the sum of the absolute difference of the subset (column) sum with S/3 is minimum
If not, I'm completely confused.
Mattia Deriu
on 8 Apr 2019
John D'Errico
on 8 Apr 2019
Edited: John D'Errico
on 8 Apr 2019
That is eactly what I gave you. In the first part of my answer, I showed how to generate ALL subsets. Thus every possible subset.
Then I showed how to compute ALL possible subsets of only 3 elements. Then I computed the sum of each, sorting them to be in increasing distance from the target. If you want to compute ALL possible subsets, then just don't use a tolerance at all.
Asfar as your comment that each sum needs to be as close as possible to the target sum, that is indeed achieved by my use of a sort and a tolerance. Surely that is what you mean by "as close as possible".'
Let me say it differently:
- The word "all" implies every single subset.
- The phrase "as close as possible" implies there are some subsets that are deemed to be unacceptable, because they are not sufficiently close. That is, they exceed some mininum deviation from the target.
However, you cannot have both all and some at the same time. All is all, and some is not all.In fact, I showed you how to do either of those choices. So what is your problem, if you are not happy with both the "all" solution and the "some" solution?
If you have some other problem, then you need to explain your problem far more clearly, because you are making no sense in these comments.
Guillaume
on 8 Apr 2019
@John, the way I understood the problem, the purpose is not to find a single subset with a given number of elements. Instead it's to partition P into N subsets of equal size, then find which such partition has the minimal absolute difference from S/3 for the N sums of the subsets. So for the bins you're not looking for binary bins:
binP = [
0 0 0 1 1 0 0 1 0 1 0 0 1 1
0 0 0 1 1 0 0 1 0 1 1 0 0 0
...
but for bins of the form
binS = [
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 2 1 2 2 2 2 3 3 3 3 3
1 1 1 1 2 2 1 2 2 2 3 3 3 3 3
... unique permutations of repelem(1:3, 5)
Once you have binS, it is trivial to compute the subset sums:
N = 3;
P = [P;zeros(mod(-numel(P), N), 1)]; %pad to multiple of N
[~, order] = sort(binS);
matrices = reshape(P(order)', [], N, size(binS, 1));
colsum = sum(matrices, 1);
distance = sum(abs(colsum - sum(P)/3), 2);
[~, chosenmatrix] = min(distance);
Mattia Deriu
on 9 Apr 2019
Categories
Find more on Logical in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!