pearsrnd and pearspdf are not coherent for the Pearson IV distribution?
    7 views (last 30 days)
  
       Show older comments
    
We are having some trouble using the Pearson 4 implementation of the functions pearsrnd, pearspdf and pearscdf.
We think that pearsrnd does not generate samples from the distribution defined in pearspdf. In particular, we've found that pearsrnd over generates observations in the tails of the distribution.
In order to see this, we've implemented two simple tests:
1. We've compared the PDF of the Pearson 4 distribution (using pearspdf) to the PDF obtained from a simulation using pearsrnd, but we've found that the two are very different (even with a very long sample). This problem doesn't happen for other distributions. In particular, we've implemented the same test with the Pearson 6, and the results are satisfactory.
2. In addition, to test if the generatated sample is compatible with its CDF, we've applied the function pearscdf to the generated sample. We should have got a uniform distribution, because if X is a random variable and F its CDF, then F(X) is uniform in [0,1]. This is indeed what we've got with the Pearson 6, as expected, but not what we've got using the Pearson 4.
We've conducted a similar experiment using Mathematica, and we get the correct result.
We've attached the code we've used.
Any idea why we're getting this result?
Thank you very much.
%% Seed
rng(123)
%% Parameters
%Type IV
mu4 = 5;
sigma4 = 1;
skew4 = 1;
kurt4 = 10;
% Check the distribution type
[ign, type4] = pearsrnd(mu4, sigma4, skew4, kurt4, 1);
disp("This Pearson is of type " + type4)
% Type VI
mu6 = 2;
sigma6 = 1;
skew6 = 2;
kurt6  = 10;
% Check the distribution type
[ign, type6] = pearsrnd(mu6, sigma6, skew6, kurt6, 1);
disp("This Pearson is of type " + type6)
%% Simulate n observations according to both Pearson distributions defined above
n = 100000;
samples4 = pearsrnd(mu4, sigma4, skew4, kurt4, 1, n);
samples6 = pearsrnd(mu6, sigma6, skew6, kurt6, 1, n);
%% First test: we compare the simulated pdf (i.e. histogram of the sample) to the closed form PDF
pdf4 = @(x) pearspdf(x, mu4, sigma4, skew4, kurt4);
pdf6 = @(x) pearspdf(x, mu6, sigma6, skew6, kurt6);
x_ = linspace(-10, 10, 10000);
y4 = pdf4(x_);
y6 = pdf6(x_);
figure('Position', [100, 100, 1000, 400]); % [left, bottom, width, height]
subplot(1, 2, 1);
plot(x_, y4, Color='red', LineWidth=2)
title('Simulated and closed form PDF for Pearson 4') 
hold on
histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold off
subplot(1, 2, 2);
plot(x_, y6, Color='red', LineWidth=2)
title('Simulated and closed form PDF for Pearson 6') 
hold on
histogram(samples6, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold off
sgtitle('Test 1: Comparison between Pearson 4 and 6');
%% Second test: Let X be a random variable and F its CDF, then F(X) is uniform in [0,1].
% We apply the Pearson CDF to the simulated observations and plot its
% histogram. We should obtain something resembling a uniform distribution. 
figure('Position', [100, 100, 1000, 400]); % [left, bottom, width, height]
subplot(1, 2, 1);
u4 = pearscdf(samples4, mu4, sigma4, skew4, kurt4);
histogram(u4)
title('F(X) for Pearson 4') 
subplot(1, 2, 2);
u6 = pearscdf(samples6, mu6, sigma6, skew6, kurt6);
histogram(u6)
title('F(X) for Pearson 6')  
sgtitle('Test 2: Comparison between Pearson 4 and 6');
%% Conclusion
%We get the expected results using the Pearson 6, but we cannot say the
%same for the Pearson 4. Why?
1 Comment
Accepted Answer
  David Goodmanson
      
      
 on 15 Feb 2024
        
      Edited: David Goodmanson
      
      
 on 17 Feb 2024
  
      Hello Marco,
It appears that you are implicating the pearson rand function here, but I believe the real culprit is the pearson pdf.
I have the rand function but not the pdf since the latter does not appear until Matlab 2023b.  I wrote my own pearson4 pdf function, code is below,  and it agrees pretty well with the Matlab pearson rand function (that plot is not shown).
After obtaining a new_cdf by numerical integration, a histogram of  new_cdf(pearson rand) for comparison to the ones you did is shown below.  And it makes sense. 
mu = 5;
sdev = 1;
skew = 1;
kurt = 10;
n = 100000;
samples4 = pearsrnd(mu,sdev,skew,kurt, 1,n);
figure(1); grid on
histogram(samples4, 1000, 'Normalization', 'pdf', 'FaceColor', 'blue')
hold on
x = -20:.01:20;
f = pears4pdf(x,mu,sdev,skew,kurt);
plot(x,f)
hold off
% --------
function f = pears4pdf(x,mu,sdev,skew,kurt)
% pearson type 4 pdf
m = (5*kurt-6*skew^2-9)/(2*kurt-3*skew^2-6);
b = 2*(m-1)/sqrt((4*(2*m-3)/((m-2)^2*skew^2)) -1)*(-sign(skew));
J0 = J(m,0,b);
a = sdev*sqrt(J0/J(m,2,b));
z = (x-mu)/a;
z0 = -b/(2*(m-1));
arg = z+z0;
% f is normalized
f = (1/(J0*a))*(1+arg.^2).^(-m).*exp(-b*atan(arg));
end
% --------
function u = J(m,n,b)
% u = Int{-inf,inf} (z-z0)^n (1+z^2)^(-m) exp(-b*atan(z)) dz
% m > (n+1)/2
z0 = -b/(2*(m-1));
if m <= (n+1)/2
    u = inf;
elseif n == -1 
    u = 0;
elseif n == 0
    u = real((pi*gamma(2*m-1)) ...
        /(2^(2*m-2)*exp(gammalni(m+b*i/2))*exp(gammalni(m-b*i/2))));
else
    C = 2*(m-1)-(n-1);
    u = ((n-1)/C)*(2*z0*J(m,n-1,b) + (1+z0^2)*J(m,n-2,b)) ;  
end
end
% ---------
function w = gammalni(z)
% natural log of gamma function for input vector z with
% constant real part, for example z = 2 + i*(0:.01:6),
% by Stirling's approximation.
% see Whittaker and Watson, 4th edition, 12.33.  note that
% error term for gamma(x+iy) is always less than that for gamma(x)
% David Goodmanson
%
% w = gammalni(z)
if any(diff(real(z))); error('Re(z) must be constant'); end
% these affect the precision of the calculation.
% higher values use more terms but in theory should be better. 
zreal = 6;  nterm = 5;
if real(z(1)) <(1/4)       % use reflection property to keep x >0
end
  reflect = 'true';
  z = 1-z;
else
  reflect = 'false';
end
m=0;                       % make Re(z) >= zreal so that |z| is large enough
while real(z(1)) < zreal 
  z = z+1;
  m = m+1;
end
% c(i) = (-)^(i-1) B(i)/(2i)(2i-1), B are Bernoulli numbers
 c(1) = 1/12;             c(2) = -1/360;         c(3) = 1/1260;
 c(4) = -1/1680;          c(5) = 1/1188;         c(6) = -691/360360; 
 c(7) = 1/156;            c(8) = -3617/122400;   c(9) = 43867/244188;
c(10) = -174611/125400;  c(11) = 77683/5796;
w = log(z).*(z-1/2) -z + log(sqrt(2*pi));     % log gamma for large z
zn  = z;                           
for j=1:nterm;
  zn = zn./(z.^2);           %  correction terms to log gamma are    
  w = w + c(j)*zn;           %  c(1) 1/z + c(2) 1/z^3 + ...
end
for n =1:m;                  % bring Re(z) back to its actual value.
  z = z-1;                   % gamma(z-1) = gamma(z)/(z-1)
  w = w -log(z);
end
% gamma(z)gamma(1-z) = pi/sin(pi z), but don't let argument
% of sine get too large and generate infinity
if strcmp(reflect, 'true')
  absy = abs(imag(z));
  w = log(pi) -w + -pi*absy ...
      -log( ( exp(pi*(i*z-absy)) -exp(-pi*(i*z+absy)) )/(2*i) );
end
Histogram of new_cdf(pearsrnd).  new_cdf is obtained by numerical integration of pears4pdf.  

10 Comments
  Drew
    
 on 23 Feb 2024
				
      Edited: Drew
    
 on 7 Jun 2024
  
			MathWorks has posted a bug report which includes a workaround for affected versions of 23b. See https://www.mathworks.com/support/bugreports/3219165  
This has been fixed in R2024a and in R2023b Update 8. The bug report also includes a workaround for affected versions of 23b.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!






