Are there speedier alternatives to the mode function?

I'm trying to sort through a very large matrix (12x4^12). It's actually a matrix generated by combinator of all possible permutations with repetitions of choosing 12 from 4 , though that's not critical to this question. What I need to do is identify all columns in which each number is repeated no more than x times (say x = 3 here). Here's simplified sample code:
x = 3;
poss = randi(4,12,4^12); % in my code: poss = combinator(4,12,'p','r')';
[M,F] = mode(poss,1);
evenposs = poss(:, F<=x );
My problem is that the third line above (with mode) is very slow, over 400 seconds on my computer, so it seems that doing it through the mode function is highly inefficient. Obviously I could do this with a for loop, but I've been trained to avoid that in Matlab - I thought this mode function might have been a crafty way to avoid the loop, but now I'm wondering if there are still craftier ways of achieving this.
Bonus points (i.e. my eternal gratitude) if anybody can suggest a way of using combinator or another package to generate permutations with limited repetitions, so that I can avoid this filtering step entirely.

 Accepted Answer

You could attempt to take advantage of the fact that elements in 'poss' are positive integers and can be used as indices. However, I tend to doubt that in a span of only 12 elements in each column of 'poss' the following code could outperform Mathworks' 'mode', but you can try it and see.
subs = [poss(:),reshape(repmat(1:4^12,12,1),[],1)];
F = max(accumarray(subs,1,[4,4^12]),[],1);
evenposs = poss(:,F==x); % <-- Or do you mean F<=x?

7 Comments

Thanks for that idea. You're correct that I should have F<=x, but in this case it doesn't matter since F cannot be less than 3 (12/4). Also, I think I may be confusing things, as I'm not really choosing 4 from 12, but rather choosing 12 from 4, allowing repetitions. I've edited my question to reflect this.
As for the idea itself, it's better than you think. The mode function step took 421 seconds. Your replacement step took 94 seconds. Pretty massive savings. Makes you wonder what's going on under the hood in that mode function.
I still think there might be faster ways out there. And since I'll be using this repeatedly, and in loops, these savings are multiplied.
I think I'll leave the question open for a few more hours for other suggestions before accepting yours - I hope that's not bad etiquette. Again, thanks!
It should be noted that in the particular case you give as an example, there is a way to generate all possible "permutations" of 12 out of 4 with repetitions limited to x = 3 without having to eliminate from the much higher 4^12 count. It depends on the equality 4*x==12. It can be shown that the total number of such permutations in this case is:
12!/(3!*3!*3!*3!) = 369300
which is considerably less than 4^12 = 16777216.
In Answers #69232 and #84215 I worked out a method which could be extended to this case which makes calls on 'nchoosek'. It would utilize the fact that
12!/(3!*3!*3!*3!) = (12!/3!/9!) * (9!/3!/6!) * (6!/3!/3!)
and make calls on nchoosek(1:12,3), nchoosek(1:9,3), and nchoosek(1:6,3). Unfortunately, I don't have the time right now to work out the details of this extension for you but perhaps a little later I can. I'm not sure how valuable this would be for you in general though, since it only works for the smallest possible value of x. For larger values of x things get much more complicated.
I've not tried it at all and so have no klew if it would be better or even worse, but have you considered histc on bins of the unique values?
@Roger - That does sound desirable - for most of the work I'll be doing, having x at its minimum will suffice. I've been referring to those as even solutions to the assignment problem.
While I'm aware of literature on permutations with repeated literature, I haven't attempted to code it up until this point. It sounds like you've essentially done so for this particular case, so I will refer to the answers you suggest in hopes of applying it in my code. I'll update this question if I do work it out. Thank you again very much for your help.
@dpb - That's an interesting idea. I hadn't thought of histc, though I agree it's worth a look. I'll give it a go and post the results here. Thanks!
Here is one possible way to code that method I mentioned to you. The array 'x' plays the role of 'evenposs' except that it is transposed. The quantities 'r1', 'r2', 'r3', and 'r4' are the desired numbers of copies of 1, 2, 3, and 4, respectively. There may be a way to vectorize this code instead of the nested for-loops but it gets rather involved and may not be efficient.
r1 = 3; r2 = 3; r3 = 3; r4 = 3;
c1 = nchoosek(1:r1+r2,r1);
n1 = size(c1,1);
c2 = nchoosek(1:r1+r2+r3,r1+r2);
n2 = size(c2,1);
c3 = nchoosek(1:r1+r2+r3+r4,r1+r2+r3);
n3 = size(c3,1);
x = repmat(4,n1*n2*n3,r1+r2+r3+r4);
ic = 0;
for i1 = 1:n1
j1 = c1(i1,:);
for i2 = 1:n2
j2 = c2(i2,:);
j3 = j2(j1);
for i3 = 1:n3
ic = ic + 1;
j4 = c3(i3,:);
x(ic,j4) = 3;
x(ic,j4(j2)) = 2;
x(ic,j4(j3)) = 1;
end
end
end
Brilliant - thanks Roger. I may add further comments later describing timings once I have more time to implement this next week.
A quick update - I spoke with Matt Fig (the guy who wrote combinator.m on MEX) to see if he had any ideas of how to do permutations with limited reputations more generally, and he said he'd think about it. He also, however, proposed an alternative to my use of the mode function (which stemmed this question) and Roger's suggestion using accumarray (see above). It's actually quite simple:
x = 3;
poss = combinator(4,12,'p','r')';
% MY METHOD
tic
[M,F] = mode(poss,1);
evenposs = poss(:,F==x);
toc
% ROGER'S METHOD
tic
subs = [poss(:),reshape(repmat(1:4^12,12,1),[],1)];
F = max(accumarray(subs,1,[4,4^12]),[],1);
evenposs = poss(:,F==x);
toc
% MATT'S METHOD
tic
idx = true(1,size(poss,2));
for ii = 1:4
idx = idx & sum(poss==ii)<=x;
end
evenposs = poss(:,idx);
toc
All three methods achieve the same result. Timings are:
  • My Method: 406 seconds
  • Roger's Method: 114 seconds
  • Matt's Method: 7 seconds
As you can see, while implementing Roger's replacement saved me a lot of time, implementing Matt's saved me even more. Since this filtering of the combinator output now doesn't take too long, it's less important for me to directly generate the subset of combinator's output with limited repetitions, though obviously that would still save further time. Roger's suggestion above gives me a pretty good start on how to do that if I want to speed up the code further, but I think it's fast enough for now. Thanks again to both Roger and Matt.

Sign in to comment.

More Answers (0)

Categories

Products

Asked:

on 8 Jul 2014

Edited:

on 5 Aug 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!