Is there a function to create the P-P plot in Matlab, to compare two cumulative distribution functions against each other?
10 views (last 30 days)
Show older comments
Is there a function to create the P-P plot in Matlab, to compare two cumulative distribution functions against each other?
From Wikipedia: "In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, or for assessing how closely a dataset fits a particular model. It works by plotting the two cumulative distribution functions against each other; if they are similar, the data will appear to be nearly a straight line. This behavior is similar to that of the more widely used Q–Q plot, with which it is often confused."
3 Comments
Accepted Answer
Torsten
on 6 Sep 2024
Moved: Torsten
on 6 Sep 2024
rng default; % for reproducibility
a = 0;
b = 100;
nb = 50;
% Create two log-normal distributed random datasets, "x" and "y'
% (but we can use any randomly distributed data)
x = (b-a).*round(lognrnd(1,1,1000,1)) + a;
y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;
[F,t1] = ecdf(x);
[t1,ia] = unique(t1,'Stable');
F = F(ia);
[G,t2] = ecdf(y);
[t2,ia] = unique(t2,'Stable');
G = G(ia);
teval = unique(sort([t1;t2]));
Feval = interp1(t1,F,teval);
Geval = interp1(t2,G,teval);
hold on
plot(Feval,Geval,'o')
plot(0:1,0:1,'-','color','k')
hold off
grid on
3 Comments
Torsten
on 6 Sep 2024
Edited: Torsten
on 6 Sep 2024
% Modified Rahul solution
% inputs
rng default;
a = 0;
b = 100;
nb = 50;
x = (b-a).*round(lognrnd(1,1,1000,1)) + a;
y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;
[f1, x1] = ecdf(x);
[f2, x2] = ecdf(y);
[x1_unique, ia1, ~] = unique(x1);
f1_unique = f1(ia1);
[x2_unique, ia2, ~] = unique(x2);
f2_unique = f2(ia2);
f1_interp = interp1(x1_unique, f1_unique, union(x1_unique,x2_unique));
f2_interp = interp1(x2_unique, f2_unique, union(x1_unique,x2_unique));
hold on
plot(f1_interp, f2_interp, 'o');
plot(0:1,0:1,'-','color','k')
hold off
grid on
More Answers (1)
Rahul
on 6 Sep 2024
Edited: Rahul
on 7 Sep 2024
Hi Sim,
I understand that you’re trying to generate aPP (Probability-Probability) plot of two datasets, where a pp plot is made by plotting the fraction failing (CDF) of one distribution vs the fraction failing (CDF) of another distribution.
To generate this plot we simply plot the CDF of one distribution vs the CDF of another distribution. If the distributions are very similar, the points will lie on the 45 degree diagonal. Any deviation from this diagonal indicates that one distribution is leading or lagging the other.
Below is the reference code for your understanding:
1. Define Your Data
Assuming two datasets of unirform random distribution, ‘data1’ and ‘data2’, which you want to compare using a P–P plot.
data1 = randn(100, 1); % Example data set 1
data2 = randn(100, 1); % Example data set 2
2. Compute the Cumulative Distribution Functions (CDFs)
You need to calculate the empirical CDFs of both datasets, for which you can use the ‘ecdf’ function, and futher interpolate the CDF values to match the percentiles of the other dataset.
% Compute CDFs for data1
[f1, x1] = ecdf(data1);
% Compute CDFs for data2
[f2, x2] = ecdf(data2);
% Ensure x1 and x2 are unique and sorted
[x1_unique, ia1, ~] = unique(x1);
f1_unique = f1(ia1);
[x2_unique, ia2, ~] = unique(x2);
f2_unique = f2(ia2);
% Interpolate CDFs
f1_interp = interp1(x1_unique, f1_unique, x2_unique, 'linear', 'extrap');
f2_interp = interp1(x2_unique, f2_unique, x1_unique, 'linear', 'extrap');
3. Create the P–P Plot
After aligning CDF values from both datasets, you can plot them against each other.
figure;
plot(f1_interp, f2_interp, 'o');
xlabel('CDF of data1');
ylabel('CDF of data2');
title('P–P Plot');
hold on;
xline = [min(f1_interp), max(f1_interp)];
yline = xline;
% Plot the 45-degree line
plot(xline, yline, 'r--', 'LineWidth', 2);
axis equal;
grid on;
- Normalization: Make sure your datasets are appropriately scaled or normalized if they are not in the same range.
- Handling NaNs or Infinities: Ensure your data does not contain NaNs or infinities, which can affect interpolation and plotting.
For more information regarding usage of ‘cdf’ function, refer to the documentation link mentioned below:
https://www.mathworks.com/help/stats/prob.normaldistribution.cdf.html
3 Comments
See Also
Categories
Find more on Surface and Mesh Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!