I want to delete 5% random Selected Index from array and replace zero at the end MATLAB

1 view (last 30 days)

Show older comments

Med Future on 1 Mar 2022

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1661010-i-want-to-delete-5-random-selected-index-from-array-and-replace-zero-at-the-end-matlab

Commented: Walter Roberson on 2 Mar 2022

Accepted Answer: Matt J

dataset1.mat

Open in MATLAB Online

hello everyone i hope you are doing well

i have dataset of shape 1x1000, i have implemeneted the following code to delete 5% samples randomly

but output replace only first index value is saved

How can i do it in MATLAB

Please help

1x1000 but value is not saving in output

output matrix =[200 0 0 0 .......]

load('dataset1')
N = numel(dataset1) ; 
percentageMP=5;
size_MP=round(percentageMP/100*N);
MPV=zeros(size(dataset1));
for i=1:length(size_MP)
    MP = randsample(N,size_MP) ;  
    sortvalue=sort(MP);
end
Temp_series1=zeros(size(dataset1));
index=1
totallength=length(dataset1)-length(MP)
for j=1:length(totallength)
for k=1:length(MP)
if j==MPV(k)
    index=index+1;
end
end 
Temp_series1(j)=dataset1(index)
end

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

Matt J on 1 Mar 2022

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1661010-i-want-to-delete-5-random-selected-index-from-array-and-replace-zero-at-the-end-matlab#answer_906980

Edited: Matt J on 1 Mar 2022

Open in MATLAB Online

load('dataset1')
N = numel(dataset1) ; 
percentageMP=5;
size_MP=round(percentageMP/100*N);
discard=randperm(N,size_MP);
dataset1(discard)=[];
dataset1(end+1:N)=0;

34 Comments
Show 32 older commentsHide 32 older comments

Matt J on 1 Mar 2022

Edited: Matt J on 1 Mar 2022

Open in MATLAB Online

datasetvalue.mat

I'm not seeing that error.

dataset=load('datasetvalue').dataset;

[M,N] = size(dataset) ;

percentageMP=5;

size_MP=round(percentageMP/100*N);

Discards=nan(M,size_MP);

for i=1:M

row=dataset(i,:);

discard=randperm(N,size_MP);

row(discard)=[];

row(:,end+1:N)=0;

dataset(i,:)=row;

Discards(i,:)=discard;

end

whos dataset Discards

Name Size Bytes Class Attributes Discards 250x50 100000 double dataset 250x1000 2000000 double

spy(dataset)

Discards(1:10,1:20)

ans = 10×20

263 538 503 893 687 752 115 823 130 587 350 617 367 189 626 971 45 480 870 750 430 353 17 869 295 559 945 319 904 12 742 484 508 125 739 174 262 838 855 824 363 161 985 154 901 264 755 634 374 600 962 940 297 6 57 585 471 146 423 350 923 780 355 945 122 12 102 906 345 304 723 66 43 467 804 716 548 776 118 240 719 925 172 464 41 51 525 606 437 608 410 733 479 885 379 507 692 32 990 385 553 684 8 72 122 851 502 544 588 214 52 99 71 419 753 296 564 216 328 379 879 812 273 797 416 545 54 564 246 855 493 613 904 440 459 873 915 573 197 342 599 84 864 429 177 789 610 465 668 186 94 188 912 747 790 687 294 119 923 537 90 265 628 598 233 922 382 12 774 584 786 87 717 345 990 133 833 961 699 491 190 892 644 48 653 629 213 885 802 165 778 706 636 102 773 386 100 118 359 792

Walter Roberson on 1 Mar 2022

Open in MATLAB Online

[~, colidx] = sort(rand(size(dataset)), 2) ;

Create an array of random numbers the same size as the dataset. Sort it along the rows, discarding the actual sorted values, but keeping the sort indices. So colidx will be an array the same size as dataset, in which each row is a list of indices into the row, with the indices reflecting the sorting order of a list of random numbers.

Why would you do that? Well, because each index now appears exactly once in each row, and the order of indices is random. In other words, you have produced a random permutation of the column indices, and you hae created a different such random permutation for each row.

If you were to try to use randperm() you would find that it is restricted to outputing a single vector, not a 2D array in which each row or column is different.

keep = floor(0.95*size(dataset, 2));

In other words, calculates the 950 that is the number of entries to leave untouched per line.

colidx = sort(colidx(:, 1:keep), 2);

so that matrix of random permutations of indices... take only the first 950 columns of it. Then sort along the second dimension. What you get out is an ordered vector for each row, with the vector being 950 elements long, and the vector consisting of entries from 1:1000 except omitting 50 random entries. Like [1, 2, 3, 5, 6, 8, ...] . These will be the column indices of what to keep for that particular row; with you only having kept 950 out of 1000 possible, you are selecting 950 out of 1000 to be kept.

rowidx = repmat((1:size(dataset, 1)).', 1, keep) ;
newds = dataset(sub2ind(size(dataset), rowidx, colidx));

Would you believe... magic?

Not actually magic, but certainly arcane, in the sense of obscure "hidden" knowledge.

The rowidx line is constructing

1 1 1 1 ... 1 (950 times)
2 2 2 2 ... 2 (950 times)
3 3 3 3
4 4 4 4

up to the number of rows.

colidx is, remember, things like

2 3 5 6 8 .... 950 entries
3 4 6 8 9 .... 950 entries
13 14 15 18 ... 950 entries

and after you create rowidx and colidx are arrays the same size. And you can read off corresponding elements of the two as being a combination row index and column index of some element you want to keep. In this example data, you want to keep (1,1), (1,2), (1,3), (1,5), (1,6), (1,8), (2, 1), (2,3), (2,4), (2,6), (2,8), (2,9), (3,5), (3,13), (3,14), (3,15), (3,18) .

now sub2ind() takes those pairs, row index and column index, and from the corresponding elements, calculates the linear indices those places would correspond to in array the size() of the dataset. So the sub2ind() would return an array of indices that might look like

1001 2001 4001 5001 7001
2002 3002 5002 7002 8002
12003 13003 14003 17003

These are linear indices into dataset.

Then dataset() those indices causes those values to be extracted. So you would get an array that was like

  [d(1,1), d(1,2), d(1,3), d(1,5), d(1,6) d(1,8)...
   d(2,1), d(2,3), d(2,4), d(2,6), d(2,8), d(2,9) ...
   

Each row would have 950 elements, and each row has values extracted from exactly one row of input.

This is a somewhat obscure way to do mass extraction of data from an array when the data might not be regularly spaced.

newds is now 250 rows and 950 columns

newds(end, size(dataset, 2)) = 0;

size(dataset,2) is the original number of columns in dataset, which is 1000. newds(250,1000) is beyond the end of newds as newds is 250 by 950, so by assigning a 0 at newds(250,1000) you are implicitly asking to expand the matrix to be 250 x 1000 by adding extra columns of zeros.

That line of code has a bug in the situation where none of the original data was dropped -- if keep was the same as the number of columns then this line of code would be in theoretical error as it would zero the last entry of the matrix. There are other ways of padding with 0 that are more robust for the case where the old and new matrix are to end up the same size.

Med Future on 2 Mar 2022

@Matt J Can you please explain your latest code

Walter Roberson on 2 Mar 2022

"what the effect of randomsample instead of randperm?"

The code in randsample() was designed before Mathworks upgraded the internal randperm algorithm for the two-input case. Because of that, it has an internal "optimization" for the case where less than 1/4 of the values are being selected, with the "optimization" being based on using randi() until enough distinct random values have been generated and then randomizing their order using randperm. This is guaranteed to require at least twice as many random number generations as would be used for a Fisher-Yates shuffle, which is what randperm would use for this configuration.

Also randsample requires the Statistics Toolbox but randperm does not.

Products

MATLAB

Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

I want to delete 5% random Selected Index from array and replace zero at the end MATLAB

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

34 Comments
Show 32 older commentsHide 32 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

I want to delete 5% random Selected Index from array and replace zero at the end MATLAB

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

34 Comments Show 32 older commentsHide 32 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

34 Comments
Show 32 older commentsHide 32 older comments