Using "unique" to identify unique values AND number of occurrences of each unique value

17 views (last 30 days)

Below is the head entries of a table
head(hits)
ID res1 score
_____________ ____ _______
AGAP001076-RD 282 0.67229
AGAP001076-RD 285 0.75292
AGAP001076-RD 286 0.66957
AGAP001076-RD 296 0.51694
AGAP001076-RD 298 0.51655
AGAP001076-RD 310 0.54564
AGAP001076-RD 314 0.74495
AGAP010077-RA 349 0.52136
Using "unique" I can obtain unique IDs. I would also like to obtain the number of occurences of each unique ID, e.g AGAP001076-RD 6
Thank you for your attention

Accepted Answer

Steven Lord
Steven Lord on 19 Sep 2024
Use the groupcounts function.
A = {'AGAP001076-RD' 282 0.67229
'AGAP001076-RD' 285 0.75292
'AGAP001076-RD' 286 0.66957
'AGAP001076-RD' 296 0.51694
'AGAP001076-RD' 298 0.51655
'AGAP001076-RD' 310 0.54564
'AGAP001076-RD' 314 0.74495
'AGAP010077-RA' 349 0.52136};
[counts, groupID] = groupcounts(A(:, 1))
counts = 2×1
7 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
groupID = 2x1 cell array
{'AGAP001076-RD'} {'AGAP010077-RA'}
  3 Comments
Steven Lord
Steven Lord on 19 Sep 2024
A = {'AGAP001076-RD' 282 0.67229
'AGAP001076-RD' 285 0.75292
'AGAP001076-RD' 286 0.66957
'AGAP001076-RD' 296 0.51694
'AGAP001076-RD' 298 0.51655
'AGAP001076-RD' 310 0.54564
'AGAP001076-RD' 314 0.74495
'AGAP010077-RA' 349 0.52136};
T = cell2table(A)
T = 8x3 table
A1 A2 A3 _________________ ___ _______ {'AGAP001076-RD'} 282 0.67229 {'AGAP001076-RD'} 285 0.75292 {'AGAP001076-RD'} 286 0.66957 {'AGAP001076-RD'} 296 0.51694 {'AGAP001076-RD'} 298 0.51655 {'AGAP001076-RD'} 310 0.54564 {'AGAP001076-RD'} 314 0.74495 {'AGAP010077-RA'} 349 0.52136
If your data is in a table array like the one I created above, you just have to tell groupcounts which variable(s) in the table is/are the grouping variable(s).
countsAndID = groupcounts(T, 'A1')
countsAndID = 2x3 table
A1 GroupCount Percent _________________ __________ _______ {'AGAP001076-RD'} 7 87.5 {'AGAP010077-RA'} 1 12.5
You can use multiple grouping variables as well. Let's make some data with duplicate rows and replace the values in A2 with ones more likely to cause a collision in the combination of the grouping variables A1 and A2.
T2 = T(randi(height(T), 20, 1), :);
T2.A2 = randi(5, 20, 1)
T2 = 20x3 table
A1 A2 A3 _________________ __ _______ {'AGAP001076-RD'} 2 0.51655 {'AGAP001076-RD'} 4 0.74495 {'AGAP001076-RD'} 1 0.75292 {'AGAP001076-RD'} 4 0.51655 {'AGAP001076-RD'} 5 0.54564 {'AGAP001076-RD'} 5 0.66957 {'AGAP001076-RD'} 5 0.51694 {'AGAP010077-RA'} 2 0.52136 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 3 0.75292 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 4 0.74495 {'AGAP001076-RD'} 4 0.51655 {'AGAP001076-RD'} 4 0.51694 {'AGAP001076-RD'} 4 0.51694 {'AGAP001076-RD'} 2 0.67229
countsAndID = groupcounts(T2, ["A1", "A2"])
countsAndID = 6x4 table
A1 A2 GroupCount Percent _________________ __ __________ _______ {'AGAP001076-RD'} 1 4 20 {'AGAP001076-RD'} 2 2 10 {'AGAP001076-RD'} 3 1 5 {'AGAP001076-RD'} 4 8 40 {'AGAP001076-RD'} 5 4 20 {'AGAP010077-RA'} 2 1 5
Let's check. How many rows of T2 have the same A1 and A2 values as the first row of the countsAndID table?
matchesForFirstRowA1 = matches(T2.A1, countsAndID{1, "A1"});
matchesForFirstRowA2 = T2.A2 == countsAndID{1, "A2"};
result = T2(matchesForFirstRowA1 & matchesForFirstRowA2, :)
result = 4x3 table
A1 A2 A3 _________________ __ _______ {'AGAP001076-RD'} 1 0.75292 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 1 0.51655
Does that match the count that groupcount returned in that first row of countsAndID?
isequal(height(result), countsAndID{1, "GroupCount"})
ans = logical
1

Sign in to comment.

More Answers (1)

Animesh
Animesh on 19 Sep 2024
In MATLAB, you can use the "unique" function along with the "histcounts" function to find the number of occurrences of each unique ID in your table. Here's how you can do it:
% Assume 'hits' is your table
% Extract the 'ID' column from the table
ids = hits.ID;
% Find unique IDs and their indices
[uniqueIDs, ~, idx] = unique(ids);
% Count the occurrences of each unique ID
occurrences = histcounts(idx, 1:max(idx)+1);
% Display the results
for i = 1:length(uniqueIDs)
fprintf('%s %d\n', uniqueIDs{i}, occurrences(i));
end
You can refer the following MathWorks documentation for more information on "histcounts" function:

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!