Distribution sampling
4 views (last 30 days)
Show older comments
I have 2 million samples with three parameters (a,b,c). These are correlated each other and each have different distribution (not gausian or logarithmic). Now I need to collect 60,000 samples of them with same correlation and same distribution. Is there any particular method any one can suggest? Can any one help me?
0 Comments
Answers (1)
Doug Eastman
on 8 Jul 2011
I'm not a statistics expert but I believe randomly sampling a set of data should come close to preserving the distribution and correlation of the original data, so here's a way to take a random subset of length n of an array A:
i = randperm(numel(A));
subset = A(i(1:n));
Here's an example showing the preserved distribution:
N = 100000;
n = 10000;
x = randn(N,1)*3+12;
y = randn(N,1)*2+2;
A = [x;y];
i = randperm(numel(A));
subset = A(i(1:n));
hist(A,100);
figure
hist(subset,100);
2 Comments
Doug Eastman
on 11 Jul 2011
Sorry, fixed a typo above, but yes, this will work for any dimension A because it uses linear indexing (only one number for the index).
If you have something like 1000x3 where you want 100x3 (100 of the 1000 original samples), you would do:
i = randperm(size(A,1));
subset = A(i(1:n),:);
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!