how to put zero or nan instead of rejecting my data in Chauvenet-Script

Question

0 votes

hello, my task is to detect outliers in large dataset using chauvenet criterion.. Chauvenet-Test said: A reading may be rejected if the probability of obtaining the particular deviation is less than 1/2n. in other words it compares the probability of data deviation and reject the data from a list, if this distance is to large.. So, my question is not to Reject a data, but to replace bad data with 0 or NaN ..

I have following script:

`function [ data_bio2, data_percent_rejected, data_cv ] = chauvenet( x )
% remove zero entries
data_zeros=find(x==0.0);
data_nonzeros=find(x>0.0);
data_bio2 = x(data_nonzeros);
% compute length, mean, std, min max of non-zero data
data_length2=length(data_bio2); %
data_mean2 =mean(data_bio2); %
data_standard2 = std(data_bio2); %
data_max2 = max(data_bio2); %
data_min2 = min(data_bio2); %
% Part three - Identify outliers using Chauvenets criterion
% Z-score data and compute two-sided Z-score for Chauvenets criteria
data_probability = 1/(2*length(data_nonzeros)); %
data_zscore = (data_bio2 - data_mean2)/(data_standard2);
data_ptest = 1 - data_probability/2;
zc=norminv(data_ptest, 0, 1);
% Hence, reject data with biomass > std*zc
data_limit = zc * data_standard2;
data_cv = data_bio2( data_zscore >= -zc & data_zscore <= zc );
data_cvlength = length(data_cv);
index_rejected = find(data_zscore > zc | data_zscore < -zc);
%!!! index_rejected: these are the indices of the rejected values in your data vector
data_rejected = data_bio2(data_zscore > zc | data_zscore < -zc)
index_rejected_original = data_nonzeros(index_rejected); %!!!FLAG THOSE LINES!!!
biomass_rejected_original = data_bio(index_rejected_original);
%!!!index/biomass_rejected_original: these are the lines/biomasses
%of your original data file that need to be flagged
% percent of data rejected by Chavenets criterion
data_percent_rejected = (1- data_cvlength/length(data_bio2))* 100 
% compute histogram using linear bin-size
[M,Y]=hist(data_bio2,1000);
[M_cv]=hist(data_cv,Y);
end

So, how can I change the script to put zero or Nan for my bad data and not to reject it from the list Thank you in advance!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Star Strider on 22 Dec 2014

Open in MATLAB Online

0 votes

If I understand your code correctly, this will replace your ‘data_rejected’ selections withto NaN:

data_bio2(index_rejected) = NaN;

I would replace them with NaN instead of zero because zero could enter into your calculations and be considered a valid number. NaN will not be considered a valid number.

4 Comments
Show 2 older comments Hide 2 older comments

panik772 illza on 22 Dec 2014

thank you very much, guys! Yes, you are right, it would be better to put Nan instead of zeros! The goal is to keep the quantity of the dataset and replace erroneous data with some kind of pertinent value, like mean or linear interpolation between Nan.

Star Strider on 22 Dec 2014

My pleasure!

Sign in to comment.

how to put zero or nan instead of rejecting my data in Chauvenet-Script

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

how to put zero or nan instead of rejecting my data in Chauvenet-Script

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments