Unique elements in matrix efficiently
Show older comments
Given a matrix A, what is an efficient way to obtain a matrix B that consists of only the unique elements in A (where by unique, I mean row-wise, therefore the 'ismember' function is not suitable).
For example:
A = [1 2 3 1; 4 1 6 7; 8 9 8 11]
A =
1 2 3 1
4 1 6 7
8 9 8 11
Since each row contains different number of unique elements than other rows, the matrix of unique elements would have rows each of possibly different dimensions, which is not possible. Therefore, I can fill the repeated/non-unique elements with some dumb filler values (since I am working with only positives I can replace them with a negative number (eg -1), or if it had negative numbers too, they could be replaced by NaN values.
The result therefore can be:
B =
1 2 3 -1
4 1 6 7
8 9 -1 11
(where the negative -1s could be replaced by NaNs alternatively).
Notice that although A(2, 2) has a value (1) that already exists in the previous column (A(1, 1)), it is still unique in its own row, therefore the 'ismember' function cannot be applied.
I have created a solution, but I can imagine there are more elegant and more efficient solutions using vectorization and avoiding for loops, for when matrix A is very large which happens to be the case:
B = A(:, 1);
for i = 2:size(A, 2)
NEWMEMBERS = !sum(bsxfun(@eq, A(:, i), B), 2);
NEWCOL = NEWMEMBERS .* A(:, i);
FILLER = -1 * ~NEWMEMBERS;
NEWCOL = NEWCOL + FILLER;
B = [B NEWCOL];
end
(FILLER can be more generally replaced by a vector of 0 and NaNs instead of 0s and -1s)
3 Comments
George Aipal
on 26 Feb 2016
Edited: George Aipal
on 26 Feb 2016
John D'Errico
on 26 Feb 2016
Edited: John D'Errico
on 26 Feb 2016
Note that this question does not actually contain validly executable MATLAB code, having constructs like +=, and ! in the code.
George Aipal
on 26 Feb 2016
Answers (3)
Titus Edelhofer
on 26 Feb 2016
Hi,
a rather simple version would be this:
B = -ones(size(A));
for row=1:size(A, 1)
val = unique(A(row,:));
[~,idx] = ismember(val, A(row,:));
B(row,idx) = val;
end
I haven't tried though what happens if A is large ...
Titus
Without any loops (and can be easily adapted to use a tolerance):
A = [1 2 3 1; 4 1 6 7; 8 9 8 11]
S = size(A);
[B,C] = sort(A,2);
D = [false(S(1),1),diff(B,1,2)==0];
R = (1:S(1))'*ones(1,S(2));
X = sub2ind(S,R(D),C(D));
A(X) = NaN
displays this in the command window:
A =
1 2 3 NaN
4 1 6 7
8 9 NaN 11
1 Comment
George Aipal
on 26 Feb 2016
George Aipal
on 26 Feb 2016
0 votes
5 Comments
George Aipal
on 26 Feb 2016
Edited: George Aipal
on 26 Feb 2016
Titus Edelhofer
on 26 Feb 2016
Hi George,
your welcome. Moving the filler to the end would simplify it in fact considerably:
for row=1:size(A, 1)
val = unique(A(row,:));
B(row, 1:length(val)) = val;
end
It's often the case that the vectorized version is faster, but indeed not always: sometimes the vectorized version leads to large intermediate variables which can lead to performance problems. Vectorizing just for the sake of avoiding all loops is not always the best choice.
Titus
George Aipal
on 26 Feb 2016
Edited: George Aipal
on 26 Feb 2016
Titus Edelhofer
on 26 Feb 2016
Hi George,
the filler comes before by setting
B = nan(size(A));
or
B = -ones(size(B));
or whatever.
My advice to customers is usually: write the code in a way that is simple to write and simple to read. When it's progressed and you identify bottlenecks, then start investigating by tic/toc or profiler. Don't get me wrong, a good deal of my work is teaching vectorization (one of my favorite underused functions is bsxfun). But writing unreadable vectorized code without need I try to avoid. And if I do, I add as comment the simpler/loop version so that someone else (or myself) understand what's happening.
Titus
George Aipal
on 26 Feb 2016
Edited: George Aipal
on 26 Feb 2016
Categories
Find more on Creating and Concatenating Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!