Removing duplicate rows (not "unique")

194 views (last 30 days)
I have a matrix with many (1e5+) rows and I want to remove both copies of all duplicate rows. Is there a fast way to do this? (This function needs to be run many times.)
  4 Comments
Michael Siebold
Michael Siebold on 4 May 2016
Perfect and thanks a million! I kept messing with ia and ic, but just wasn't thinking histogram... Would you mind submitting this as an answer so I can accept it?

Sign in to comment.

Accepted Answer

Roger Stafford
Roger Stafford on 5 May 2016
Edited: Roger Stafford on 5 May 2016
Let A be your matrix.
[B,ix] = sortrows(A);
f = find(diff([false;all(diff(B,1,1)==0,2);false])~=0);
s = ones(length(f)/2,1);
f1 = f(1:2:end-1); f2 = f(2:2:end);
t = cumsum(accumarray([f1;f2+1],[s;-s],[size(B,1)+1,1]));
A(ix(t(1:end-1)>0),:) = []; % <-- Corrected
  5 Comments
Michael Siebold
Michael Siebold on 5 May 2016
Edited: Michael Siebold on 5 May 2016
And this solution is even faster than the first suggestion in the comments! Thanks for all the help!

Sign in to comment.

More Answers (2)

Azzi Abdelmalek
Azzi Abdelmalek on 4 May 2016
Edited: Azzi Abdelmalek on 4 May 2016
A=randi(5,10^5,3);
tic
A=unique(A,'rows');
toc
The result
Elapsed time is 0.171778 seconds.
  3 Comments
Mitsu
Mitsu on 3 Aug 2021
I reckon your answer does not address OP's question because running the following:
A=[1 1 1;1 1 1;1 1 0];
tic
A=unique(A,'rows');
toc
Will yield:
A = 1 1 0
1 1 1
Therefore, A still contains one instance of each row that was duplicate. I believe Michael wanted all instances of each row that appears multiple times be removed.

Sign in to comment.


GeeTwo
GeeTwo on 16 Aug 2022
%Here's a much cleaner way to do it with 2019a or later!
[B,BG]=groupcounts(A);
A_reduced=BG(B==1); % or just A if you want the results in the same variable.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!