Comparing and removing rows of an array that are within 5% of each other
7 views (last 30 days)
Show older comments
I have an array which is ~30 million x 14. It is sorted in ascending order of the first element of each row. I am trying to compare each row in the array to the previous row, and remove it if all 14 values are within 5% or less of the previous row's 14 values. The idea is that, if a row is within 5% of the previous row, I can treat them as if they are duplicates, and I don't want to include them in my final data set. Since the array is large, I would prefer to use logical indexing if possible, but I am also willing to use a for loop if neccesary.
0 Comments
Answers (1)
Image Analyst
on 26 Aug 2021
Try this:
data = 10 + rand(6, 4) % Sample data
[rows, columns] = size(data);
% Find out percentage differences between an element and the one above it.
percentDifferences = abs([ones(1, columns); diff(data, 1)] ./ data)
% Find out which rows have all percent differences less than 5% of previous row.
rowsToDelete = all(percentDifferences < 0.05, 2)
% Do the deletions.
data(rowsToDelete, :) = []
0 Comments
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!