3 views (last 30 days)

Hello!

I am trying to understand why I am getting the following inconsistency with a particular result. Let's say we want to generate 5000 RVs for the Generalized Pareto Distribution. We could use the following snippet:

for j=1:5000

V(j) = random('GeneralizedPareto',1.5,2,0);

end

Using the following, snippet we can them estimate its parameters:

pd = fitdist(transpose(V),'GeneralizedPareto');

In this case, we get the results:

Generalized Pareto distribution

k = 1.52002 [1.45041, 1.58962]

sigma = 1.97559 [1.85692, 2.10184]

theta = 0

All seems right with the world so far. However, let's change the location parameter to theta = 5. Now we obtain:

Generalized Pareto distribution

k = 0.635262 [0.601225, 0.669299]

sigma = 10.9004 [10.4708, 11.3476]

theta = 0

From the looks of it, it seems as if fitdist ignores the location parameter theta. Since the Matlab documentation includes all three parameters (k, sigma, and theta) in the GeneralizedPareto distribution, then I would expect that it would also address theta in the fitdist function.

You may say that simply shifting the data "pre fitdist" should be sufficient to address this limitation. However, this is not always sufficient. The next item is a little heavy on mathematical statistics so bear with me.

Let X and A be a random variable that is independent and identically distributed (iid) such that:

X ~ exponential(A) A ~ exponential(B) B ~ Pareto(1,1)

Then

p(X=x) = (-x + (1+x) ln(1+x))/(x^2 (1+x)) , x > 0

If we try to estimate this with a Pareto distribution, then we get

Generalized Pareto distribution

k = 1.18503 [1.13854, 1.23153]

sigma = 0.82043 [0.784495, 0.858012]

theta = 0

which is in agreement with the analytical results, however if we try the same for a slightly different hierarchical model such as:

X ~ InvGaussian(A,B) A ~ Pareto(1,1) B ~ Pareto(1,1)

Then the results no longer agree. We obtain:

Generalized Pareto distribution

k = 0.480286 [0.456964, 0.503608]

sigma = 1.86124 [1.80809, 1.91595]

theta = 0

Using another program to validate my concerns, I used a program called EasyFit for the same data. I obtained: k = 0.93914, sigma = 0.85407, and theta = 0.45094.

I think the problem is with how Matlab does not calculate theta but I could be wrong.

I would greatly appreciate any insight you can provide and if you need any further information, please let me know.

Jeff Miller
on 16 Jul 2018

A couple of comments, though I don't think they will help you much.

First, check the Generalized Pareto Distribution object. It has a read-only ParameterIsFixed property which is set to true for the theta parameter. So, even though MATLAB will fit this distribution for you, it is apparently unwilling to adjust theta when it does the fit (I have no idea why they would place this restriction).

Second, theta is not just an additive shift to the distribution (your comment about subtracting theta pre-shift suggests maybe you think it is). You can see this in the pdf of the distribution, and also (more concretely) by comparing various percentiles of the random numbers generated with theta = 0 or 5. The lowest percentiles differ by 5, but the difference increases as you go to higher percentiles.

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.