removing specified data from variable
6 views (last 30 days)
Show older comments
I have a 100x2 dataset I am working with. I also have 2 random distributions of data.
I want to modify my original dataset in the following way:
- randomly generate a number from random distribution 1 and keep this many rows of the data.
- randomly generate a number from random distribution 2 and remove this many rows of the data
I want to do this for the full length of the dataset.
Can anybody help me define this?
time = [1:1:100];
var = rand(100,1);
data = [time' var]; %dataset
dist1 = 1 + (20-1).*rand(100,1); %random distribution 1
dist2 = 10 + (30-10).*rand(100,1); %random distribution 2
position1 = randi(length(dist1));
card1 = dist1(position);
position2 = randi(length(dist2))l
card2 = dist2(position);
8 Comments
Answers (2)
Davide Masiello
on 7 Nov 2022
Edited: Davide Masiello
on 7 Nov 2022
I think the following code is a simpler way of achieving your task, but it does not implement the "pulling a number from a random distribution", because honestly I still do not understand what that would be for.
Instead, at each iteration it generates a random integer (max 20) and that would be the new increment of rows to either keep or remove.
See below the code with printed text describing the action at each iteration.
data = [(1:100)' rand(100,1)] % Dataset
datanew = [];
distribution1 = randi(100,100,1); % Array of random integers (to be replaced with gaussian distribution later)
distribution2 = randi(100,100,1); % Array of random integers (to be replaced with gaussian distribution later)
index = 0;
iter = 1;
while index < size(data,1)
fprintf('This is iteration number %d.\n',iter)
if isequal(mod(iter,2),1)
increment = min(distribution1(randi(length(distribution1),1,1)),size(data,1)-index);
fprintf('The random number is %d.\n',increment)
fprintf('We keep the rows between %d and %d.\n',[index+1,index+increment])
datanew = [datanew;data(index+1:index+increment,:)];
else
increment = min(distribution2(randi(length(distribution2),1,1)),size(data,1)-index);
fprintf('The random number is %d.\n',increment)
fprintf('The rows between %d and %d do not get added to the new dataset.\n',[index+1,index+increment])
end
iter = iter+1;
index = index+increment;
end
size(data)
size(datanew)
5 Comments
Davide Masiello
on 7 Nov 2022
But why do you first generate a random distribution and then randomly take a value from it?
How is this different from just generating a random number.
I.e.
how is this
distribution1 = randi(10,100,1); % array of 100 random integers from (max val. = 10)
a = distribution1(randi(100,1,1)) % integer randomly pulled from distribution 1
different from this
a = randi(10,1,1) % random integer between 1 and 10
Davide Masiello
on 7 Nov 2022
Ok I see now, sorry I must have skipped that part.
I have modified my answer so that the number of rows to keep/remove is pulled randomly from the vectors which I called distribution1 and distribution2.
These are random vectors, you can replace them with the gaussian distributions at your discretion.
See Also
Categories
Find more on Random Number Generation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!