Speed performance: Find all y-vector entries that have the same value in an x-vector of equal length.
Hi Guys,
imagine you have an x-vector
x = [1 1 2 2 1 1];
as well as an y-vector of equal length.
y = [1 2 3 4 5 6]
And you want to find out, which values the y-vector has at those indices where the x-vector has the value 1, as well as the y-vector values at those indices where the x-vector has the value 2. So that the solution for this example would be
solution = {[1 2 5 6], [3 4]}
That is the first cell of 'solution' contains all y-values, that belong to the first unique x-value (1) & the second cell of it contains all y-values, that belong to the second unique x-value (2).
My ideas on how to implement this for vectors of any length are given below.
Does anybody has an idea, how to avoid the for-loop in approach A???
x = randi(10, 1, 100000); y = randi(5, 1, 100000);
% Approach A: tic xVals = unique(x); % get unique x values tf = x == xVals'; % get true-or-false logical array (each row specifies where the n-th value of xVals occurs in x). yVals = cell(1, size(tf,1)); % preallocate yVals for i = 1:size(tf,1) yVals(i) = {y(tf(i,:))}; % Assign all y-values that belong to one xVal-entry to one cell. end toc
% Approach B: tic [xs, id] = sort(x); % xsorted ys = y(id); % sort y the same way x was sorted [xVals, ia] = unique(xs); % Divide the ys-vector into sections whose corresponding xs-indices have the same value in the xs vector. yVals = mat2cell(ys, [1], [diff(ia)', length(ys)-sum(diff(ia))]); toc
% Runtimes: % x = randi(10, 1, 1000000); % y = randi(5, 1, 1000000); % A: Elapsed time is 0.100635 seconds. % B: Elapsed time is 0.149284 seconds.
% x = randi(100, 1, 1000000); % y = randi(5, 1, 1000000); % A: Elapsed time is 0.886396 seconds. % B: Elapsed time is 0.139918 seconds.
% Comparison % A is faster than B if there are few different x-values (about 10-20). % The number of different x-values has high impact on speed of A. % A is faster than B if x & y-vector are extremely long (> 100000). % The vector length has low impact on speed of B.
7 Comments
My pleasure!
I was surprised that accumarray was slower than the loop, even though it makes for neater code.
I didn’t test ‘Approach B’, since ‘Approach A’ (chosen randomly) was significantly faster than my accumarray call.
Answers (1)
If you're trying to avoid a FOR loop you can use arrayfun, although I'm not sure it improves speed.
It would look something like this:
x = [1 1 2 2 1 1]; y = [1 2 3 4 5 6]; solution = arrayfun(@(z) y(z==x),unique(x),'uniformoutput',false);
Paul Shoemaker
See Also
Categories
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!