Identify Duplicate values in an array and replace with Nan
20 views (last 30 days)
Show older comments
Poulomi Ganguli
on 7 Sep 2023
Edited: Dyuman Joshi
on 7 Sep 2023
Hello: I have a array as below of dimension 40,000x3. Third columns often contain duplicate values. I need to identify duplicate values and replace with Nan. Kindly help.
2.39300000000000 0 6.16800000000000
2.38720000000000 0 6.16800000000000
2.38480000000000 0 6.16800000000000
2.37380000000000 0 6.16800000000000
2.37410000000000 0 6.16800000000000
2.37020000000000 0 6.16800000000000
2.36880000000000 0 6.16800000000000
2.36350000000000 0 6.16800000000000
0 Comments
Accepted Answer
Mrutyunjaya Hiremath
on 7 Sep 2023
If you want to replace only the duplicates with 'NaN' and keep one occurrence of each value intact, here is the code:
% Sample array (Replace this with your 40,000x3 array)
array = [
2.3930, 0, 6.1680;
2.3872, 0, 6.1680;
2.3848, 0, 6.1780;
2.3738, 0, 6.1680;
2.3741, 0, 6.1690;
2.3702, 0, 6.1780;
2.3688, 0, 6.1690;
2.3635, 0, 6.1780;
];
% Extract the third column
third_col = array(:, 3);
% Find unique values and their first occurrence index
[unique_vals, ~, ic] = unique(third_col);
% Count the occurrence of each unique value
counts = accumarray(ic, 1);
% Identify values that occur more than once (duplicates)
duplicate_vals = unique_vals(counts > 1);
% Replace only duplicates with NaN, keep one occurrence of each value
for val = duplicate_vals'
idx = find(third_col == val);
third_col(idx(2:end)) = NaN; % Keep the first occurrence, replace the rest with NaN
end
% Update the third column in the original array
array(:, 3) = third_col;
% Display the updated array
disp(array);
In this code, for each duplicate value, find its indices in the third column using 'find'. Then, keep the first occurrence (index idx(1)) and replace the rest (idx(2:end)) with NaN. This will leave one instance of each value in the third column and replace only the duplicates with 'NaN'.
0 Comments
More Answers (1)
Dyuman Joshi
on 7 Sep 2023
Edited: Dyuman Joshi
on 7 Sep 2023
Here's a much faster and simpler approach -
array = [2.3930, 0, 6.1680;
2.3872, 0, 6.1680;
2.3848, 0, 6.1780;
2.3738, 0, 6.1680;
2.3741, 0, 6.1690;
2.3702, 0, 6.1780;
2.3688, 0, 6.1690;
2.3635, 0, 6.1780];
%Get the unique values and the indices corresponding to their 1st occurence
%in order they appear in the array
[val,first_idx] = unique(array(:,3),'stable')
%Convert all the values of column 3 to NaN
array(:,3) = NaN;
%Re-assign the values according to the indices
array(first_idx,3) = val
0 Comments
See Also
Categories
Find more on Matrices and Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!