resample data based on a particular variable
Show older comments
I have a large dataset as below. From the data, I want to randomly sample based on 'id' produce the same size data. Since the data has 5 ids, I would like to sample 5 ids with replacement and produce a dataset.
id value var1 var2 …
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16
With the data, the desired output could be as below (because I want to sample ids with replacement, there could be duplicated ids)
id value var1 var2 …
2 5
2 6
2 7
4 11
4 12
4 13
3 8
3 9
3 10
2 5
2 6
2 7
1 1
1 2
1 3
1 4
Answers (1)
KSSV
on 4 May 2018
A = [1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16 ];
id = A(:,1) ; val = A(:,2) ;
N = max(id) ;
idx = randperm(N) ;
iwant = cell(N,1) ;
for i = 1:N
iwant{i} = A(id==idx(i),:) ;
end
iwant = cell2mat(iwant)
Categories
Find more on Data Type Identification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!