How to improve efficiency when checking for unique combinations of table values?
Show older comments
Hello, I am trying to check each row in a sheet of data to see if it contains a combination of 2 values. For example, there might be a table like
Fruit | Color | related value
apple | blue | 3
apple | red | 8
banana | blue | 5
banana | yellow | 6
apple | blue | 16
pear | yellow | 7
I want to create a table with the first column as the top row and the second column as the first column, and with the values indicated by a corrdinate pair being the number of times that that particular pair appears.
For example:
apple | banana | pear
blue 2 1 0
red 1 0 1
yellow 0 1 1
I'm currently using code that looks like this
fileContents = zeros(size((colorNames,1),size(fruitNames,1));
for p = 1:size(uniqueFruitNames,1)
for q = 1:size(uniqueColorNames,1)
for k = 1:size(rawData,1)
if ((uniqueFruitNames(p) == rawData(k,1)) && (all((uniqueColorNames(q,:) == rawData(k,2))))
fileContents(q,p) = fileContents(q,p) + 1;
end
end
end
end
My problem is that the table that is referenced by my code contains upwards of 10 thousand values, and I have found that the above code take approximately 15 minutes to complete. The rawData sheet contains around 12 thousand rows and 70 columns. The uniqueFruitNames array is 8 x 1, and the uniqueColorNames array is aproximately 2000 x 1.
What can I do to make this more efficient?
Accepted Answer
More Answers (1)
Walter Roberson
on 14 Jun 2021
0 votes
findgroups the two columns individually. Concatenate the groups into one array of two columns. Use those as the first parameter to accumarray and use 1 as the second parameter. The result will be a count array.
Categories
Find more on Structural Mechanics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!