How to exclude data when fitting an exponential distribution

5 views (last 30 days)
I am trying to fit an exponential distribution of lifetimes. I want to exclude all lifetimes <= 1 because those represent unreliable data points. However, the fitter will then include the fact that there are zero data points in that region, rather than ignoring it. This becomes clear when I simulate a basically perfect exponential distribution:
rng('default')
x = round(exprnd(4,1e6,1)); % exponential distribution with mean 4
pd = fitdist(x,'exponential');
disp(pd.mu) % the fitted mean
x1 = x(x>1); % remove all values <= 1
pd1 = fitdist(x1,'exponential');
disp(pd1.mu) % the new fitted mean
Output:
3.9834
5.5111
The two fits are clearly different even though they are fitting the same data. How can I make the fitter ignore that range of values?
  3 Comments
J. Alex Lee
J. Alex Lee on 22 Oct 2020
i don't think this is surprising at all...you aren't fitting a distribution to a histogram and ignoring probability densities. you are fitting to actual data. so if you alter the data, of course you will alter the fits...it's like being surprised at the difference between
x = randn(1000,1);
mean(x)
x1 = x(x>1)
mean(x1)
Sjoerd Nooteboom
Sjoerd Nooteboom on 22 Oct 2020
@J. Alex Lee thanks, I understand. I suppose a possible solution would be to create the histogram data first and fit an exponential function to that in a separate line. I was just wondering if there was a simpler way...

Sign in to comment.

Accepted Answer

Jeff Miller
Jeff Miller on 22 Oct 2020
You can make use of the memory-less property of the exponential here--the mean remaining time is independent of how much time has already passed, so just reset the clock to 0 after excluding times less than 1. Rounding messes that up though--I'm not sure why.
rng('default')
% x = round(exprnd(4,1e6,1)); % exponential distribution with mean 4
x = exprnd(4,1e6,1); % exponential distribution with mean 4
pd = fitdist(x,'exponential');
disp(pd.mu) % the fitted mean
x1 = x(x>1); % remove all values <= 1
x1 = x1 - 1; % ADJUST THE SCORES TO "RESTART THE CLOCK" AT TIME 1
pd1 = fitdist(x1,'exponential');
disp(pd1.mu) % the new fitted mean
% output:
% 3.994
% 3.9924
  1 Comment
Sjoerd Nooteboom
Sjoerd Nooteboom on 23 Oct 2020
That's a nice solution, thanks! I think the rounding issue occurs because there are more values between 0.5-1.5 than between 0.0-0.5. Interestingly, the resulting mean from this method is always exactly 1 less than if one removes the 'clock resetting' line, no matter whether you round the data or not, and also for my experimental data. Not sure why this happens.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!