Deleting duplicates based on conditions of multiple columns
31 views (last 30 days)
Show older comments
Hi,
I have a large dataset (100m rows x 40 columns ) and I would like to delete any row that has duplicates on a few specific columns. See example below:
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24];
I would like to delete all rows where the three columns have duplicate within each specific column. So in this example, row 2, 4 and 9 would be deleted because e.g.
row 1 and 2 have duplicates in each of the three columns and so I'd want to delete one of the two (doesn't matter which one).
I suspect the answer is somewhere along the use of unique and logical indexing but haven't managed to figure it out. Any help would be much appreciated. (I'm using Matlab 2018b)
Thanks
3 Comments
Accepted Answer
More Answers (1)
Akash kumar
on 31 Jul 2022
% With Index Number:- Shows the which index or Row value is extract from
% the A Matrix. I thinks, It can help you.
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24]';
[B index]=unique(AA(1:3,:).','rows', 'stable')
0 Comments
See Also
Categories
Find more on Cell Arrays in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!