How do I assign rows of a variable to categories?

5 views (last 30 days)
Hello,
i have a table ("data") that consists of 4 variables (688 rows), this is how the upper 6 rows look like:
Pseudonym Indication Study-name Sequence
Patient_001 1 1 1
Patient_002 2 2 2
Patient_003 3 3 1
Patient_004 3 1 1
Patient_005 4 2 2
Patient_006 4 5 2
I want to find all groups defined by "Indication" "Study-name" "Sequence".
I created a new table: data1 = data(:,{'indication' 'study_name' 'sequence'}) and then used
[p,v] = findgroups(data1) to find all possible groups.
Now I want to assign each row in "Pseudonym" to one of these groups.
My goal is to create a new variable for every group, containing all Pseudonyms that belong to that group.
In the next step i want to randomly pick pseudonyms from each group.
Furthermore I would like to take the group-size (e.g. number of pseudonyms in one group) into consideration.
That means, that if I want to randomly pick 20 Patients from all categories and one group contains 50% of the data, then 10 patients should be picked out of this group.
could you please help me setting up the code!
Thank you so much!
Max

Answers (1)

Vatsal
Vatsal on 29 Sep 2023
I understand that you have a table “data” which consists of four columns, and you want to find the groups based on the columns "Indication", "Study-name" and "Sequence". After finding the groups you want to assign each row in “Pseudonym” to one of these groups.
After this, it is required to randomly pick “x” number of “Pseudonym” from all groups, keeping the group size in consideration.
I am attaching the code below which will randomly pick the “Pseudonym” from all groups while considering the group-size:
data1 = data(:, {'Indication', 'Study-name', 'Sequence'});
[p, v] = findgroups(data1);
groups = splitapply(@(x) {x}, data.Pseudonym, p);
numPicks = 20; % Number of pseudonyms to pick in total
pickedPseudonyms = [];
totalPseudonyms = sum(cellfun(@numel, groups));
scalingFactor = numPicks / totalPseudonyms;
[~, sortedIndices] = sort(cellfun(@numel, groups), 'descend');
sortedGroups = groups(sortedIndices);
for i = 1:numel(sortedGroups)
groupSize = numel(sortedGroups{i});
picksFromGroup = round(groupSize * scalingFactor); % Adjust picks based on group size
if picksFromGroup > 0
randomIndices = randperm(groupSize, min(groupSize, picksFromGroup));
pickedPseudonyms = [pickedPseudonyms, sortedGroups{i}(randomIndices)];
end
% Break the loop if 20 pseudonyms are selected
if numel(pickedPseudonyms) >= numPicks
break;
end
end
You can also refer to the MATLAB documentation for "randperm" to obtain more information on its usage and syntax. The link is provided below: -
I hope this helps!

Categories

Find more on Categorical Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!