sampling with replacement from matrix with clustered data
Show older comments
Hello,
I'm looking for a clever (computationally efficient) way to sample from a data matrix in clusters.
Given matrix A(n,I) where n is large, and I is small, it's easy and fast randomly resample from A by:
B=A(ceil(rand(n,1)*n),:).
But, lets say that A(:,10) and A(:,11) are indexes that represent two clusters. That is for each of the clusters in A(:,10), A(:,11) is an index of the second cluster. To give specifics, A(:,10) is four clusters and within these four clusters there are many smaller clusters (also of 4). But because n is large (about 200,000) within each of the first clusters, there are many small clusters.
In practice, what I want to is create a matrix that has n rows and is made by drawing with replacement from the first cluster and then the second cluster.
I have done this in steps by drawing from the first variable and then the second and pulling out the rows in the set (which turns out to be 4 rows) and building up a matrix (by appending each draw to the previous matrix) and doinng this many times in a for loop. It takes about 5 minutes too create one of these resampling matrices, which is too long.
Can someone point me to a clever routine or approach to improving this?
Many thanks!
Answers (0)
Categories
Find more on Point Cloud Processing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!