Hello I have found the amazing script which fits all valid parametric distributions to the data and sorts them using a metric (e.g. BIC or AIC). In a there is a example for a normal distribution: % Create a normally distributed (mu: 5, sigma: 3) random data set x = normrnd(5, 3, 1e4, 1); % Compute and plot results. The results are sorted by "Bayesian information % criterion". [D, PD] = allfitdist(x, 'PDF'); If I do this, I'm getting the Rayleigh as winning distribution. In the blog post they got the normal distribution (as expected). The plot I'm getting is in the attachment. Does somebody see from where this problem comes? I have the newest version of Matlab. Is this the problem?

It is probably a result of changes in default behavior across MATLAB versions. You are getting that, because the log likelihood value for Raleigh Distribution returns a complex number, which shouldn't happen. You can go to line 212 to and append this there: if ~isreal(NLL) error('non-finite NLL'); end Now the program will catch the complex log likelihood and disregard the distribution like it should.

Fitting probability distributions to the data (allfitdist)

Sepp on 3 Mar 2016

Edited: Sepp on 3 Mar 2016

plot.jpg

Thank you so much, now it works. :) In the blog post they got the logistic distribution as the third and generalized extreme value as the fourth best distribution. I'm getting the generalized extreme value as third and logistic as fourth best distribution, please see my attached plot.

Do you see why this is?

Edit: There is a second problem. As you can see in my plot, the normal distribution is not shown at all in the plot. => I think I have found the solution to this problem. The tlocationscale distribution seems to be visually almost identical to the normal distribution, so it is overlayed over it. How can I adapt the script so that the better fitting distributions (e.g. normal) are overlayed over the others (e.g. tlocationscale)?

Ahmet Cecen on 4 Mar 2016

Open in MATLAB Online

First problem is simply randomness.

x = normrnd(5, 3, 1e4, 1);

Run the demo 20 times, you will see that you will manage to catch an instance that logistic beats generalized extreme value.

Second is due to plotting order, which if we reverse will cause the legend to legend to reverse as well. If you don't care about that add this to line 334 - ish (since we changed the file a bit), right below where it says:

max_num_dist=4;

Add:

 Dindex = min([length(D),max_num_dist]);
 D = fliplr(D(1:Dindex));
 PD = fliplr(PD(1:Dindex));

Sepp on 4 Mar 2016

Edited: Sepp on 4 Mar 2016

Open in MATLAB Online

Thank you again for the great help. Now everything works very well. I only have a final small question:

In the script there is example 2 and example 3:

% Example 2
  data = nbinrnd(20,.3,1e4,1);
  values=unique(data); freq=histc(data,values);
  [D PD] = allfitdist(values,'NLogL','frequency',freq,'PDF','DISCRETE');
% Example 3
  data=geornd(.7,1e4,1); %Random from Geometric
  [D PD]= allfitdist(data,'PDF','DISCRETE');

In Example 2 the frequencies are given, while in Example 3 they are not given. When I ommit the frequencies in Example 2, it does not work. When I give the frequencies in Example 3 it still works.

When do I have to give the frequencies for discrete distributions or should I give them always?

Ahmet Cecen on 4 Mar 2016

Let's break it down:

You have data in both cases that is 10000 points.

In the third example you are passing the data itself, all 10000 points, and the function fits a distribution based on how many times each value is observed.

In the second example, you find the unique values of the 10000 points instead and pass that as an argument. So even if say 56 is observed 147 times, you only put it once. Your otherwise 10000 long input is now say 103 long. Now to account for that, you are passing a second argument called frequency to tell the function how many times each unique value is observed. For example instead of passing [56 56 56 56 56 ...], you say 56 -> 147 times to the function. Now if you don't pass a frequency argument while using the unique values, the function will just assume every value only occurs once, hence the different output.

Sepp on 5 Mar 2016

Edited: Sepp on 5 Mar 2016

Thanks a lot for the detailed explanation. So both examples are just the same (the same result) but the notation is different (i.e. the way to get the result)?

Ahmet Cecen on 5 Mar 2016

Open in MATLAB Online

No, here is a modified version that uses the same data to get the same results using two different notation:

 % Example 2
  data = nbinrnd(20,.3,1e4,1);
  values=unique(data); freq=histc(data,values);
  [D PD] = allfitdist(values,'NLogL','frequency',freq,'PDF','DISCRETE'); <-- Counts and Frequency
 % Example 3
  [D PD] = allfitdist(data,'NLogL','PDF','DISCRETE'); <-- Data

Sepp on 6 Mar 2016

Edited: Sepp on 6 Mar 2016

Thank you again. I have just encountered the following warning message: Maximum likelihood estimation did not converge. Iteration limit exceeded. This appeared e.g. for tlocationscale. Of course such a fit is bad. How can I exclude this? I mean if this warning appears, I just want to throw an error.

Second, currently, I'm investigating if the neagtive log likelihood is complex or infinite and if so, I'm throwing an error. Are there other checks which I should do?

Fitting probability distributions to the data (allfitdist)

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

7 Comments
Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Fitting probability distributions to the data (allfitdist)

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

7 Comments Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

7 Comments
Show 5 older comments Hide 5 older comments