Deleting duplicates based on conditions of multiple columns

Question

Nick on 28 Dec 2020

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/703957-deleting-duplicates-based-on-conditions-of-multiple-columns

Answered: Akash kumar on 31 Jul 2022

Hi,

I have a large dataset (100m rows x 40 columns ) and I would like to delete any row that has duplicates on a few specific columns. See example below:

A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24];

I would like to delete all rows where the three columns have duplicate within each specific column. So in this example, row 2, 4 and 9 would be deleted because e.g.

row 1 and 2 have duplicates in each of the three columns and so I'd want to delete one of the two (doesn't matter which one).

I suspect the answer is somewhere along the use of unique and logical indexing but haven't managed to figure it out. Any help would be much appreciated. (I'm using Matlab 2018b)

Thanks

3 Comments
Show 1 older commentHide 1 older comment

Nick on 28 Dec 2020

Thanks for this but unfortunately, this would work for this sample only I think. The actual dataset has 40 columns and i'd like to remove the rows based on the dupicates of 3 columns only, rather than all.

Nick on 28 Dec 2020

Open in MATLAB Online

Just found the answer. This way you can find the unique rows amongst a number of columns (in this case, columns 1, 2 and 3) and then produce the original table without the duplicate values.

[C,ia] = unique(A(:,1:3),'rows')
A_new = A(ia,:)

Sign in to comment.

Sign in to answer this question.

Answer 1

Nick on 28 Dec 2020

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/703957-deleting-duplicates-based-on-conditions-of-multiple-columns#answer_586042

[C,ia] = unique(A(:,1:3),'rows')

A_new = A(ia,:)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Akash kumar on 31 Jul 2022

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/703957-deleting-duplicates-based-on-conditions-of-multiple-columns#answer_1018540

Open in MATLAB Online

% With Index Number:- Shows the which index or Row value is extract from
% the A Matrix. I thinks, It can help you.
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24]';
[B index]=unique(AA(1:3,:).','rows', 'stable')
B = 10×3
     1    10     4
     1    11     5
     1    12     6
     1    12     7
     1    13     8
     2     4    25
     2    10    28
     3     5    33
     4    25    23
     4    23    24
index = 10×1
     1
     3
     5
     6
     7
     8
     9
    11
    12
    13

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Deleting duplicates based on conditions of multiple columns

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Deleting duplicates based on conditions of multiple columns

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments