How can I use histfit function?

9 views (last 30 days)
Behrooz Daneshian
Behrooz Daneshian on 13 Nov 2023
Commented: Star Strider on 13 Nov 2023
Hello all.
I have an array called "AVG_Diameter" including the measured diameters of the porous structure of soil. I also have another array called "Rounded_N" in which the existing numbers correspondig to the diameters in the fromer array are stored. In other words, the latter array would be a frequency array for the diameters stored in the "AVG_Diameter". I tried to use histfit function in order to fit a histogram on the data formed by these two arrays, but I got an an error massage expressing "Requested array exceeds the maximum possible variable size." Would you please guide me where I am doing wrong and how I can fix it? Here is my code.
clc
clear
close all
load("AVG_Diameter.mat");
load("Rounded_N.mat")
AVG_Diameter=flip(AVG_Diameter)';
Rounded_N=flip(Rounded_N)';
% Create a data array with repeated values based on frequencies
data = repelem(AVG_Diameter, Rounded_N);
% Use histfit to fit a distribution
histfit(data, max(AVG_Diameter), 'lognormal');
title('Histogram with Fitted Distribution');
xlabel('Diameter of Pores');
ylabel('Frequency');
% Add a legend
legend('Histogram', 'Fitted Distribution');

Answers (1)

Star Strider
Star Strider on 13 Nov 2023
Your data are frequency data (probably the result of performing a histogram on an original data set), not actual histogram data. Rather than attempting to re-create the histogram, likely the best option is to fit the data as they exist. (I don’t understand the reason for flipping the vectors.)
The distribution that you choose must have positive support, since the diameters are by definition all greater than zero. I chose lognpdf here, however there is likely a better choice, since it does not appear to fit the data. Choose the distribution that best describes your data.
load("AVG_Diameter.mat")
load("Rounded_N.mat")
format shortE
Data = [AVG_Diameter Rounded_N]
Data = 77×2
1.0e+00 * 2.8075e-01 1.1500e+02 2.0803e-01 2.4800e+02 1.6366e-01 3.4000e+02 1.3501e-01 4.3500e+02 1.1341e-01 6.3100e+02 9.6017e-02 8.9600e+02 8.2264e-02 1.1190e+03 7.1687e-02 1.3090e+03 6.3259e-02 1.6310e+03 5.5454e-02 3.1870e+03
figure
semilogx(Rounded_N, AVG_Diameter)
grid
xlabel('Rounded_N')
ylabel('AVG_Diameter')
lognpdf_fcn = @(b,x) lognpdf(x,b(1),b(2));
[b,fv] = fminsearch(@(b) norm(AVG_Diameter - lognpdf_fcn(b,Rounded_N)), rand(2,1)*100)
b = 2×1
4.7409e+00 1.1645e-02
fv =
3.7818e-01
mdl = fitnlm(Rounded_N, AVG_Diameter, lognpdf_fcn, b)
Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.
mdl =
Nonlinear regression model: y ~ lognpdf(x,b1,b2) Estimated Coefficients: Estimate SE tStat pValue __________ __________ __________ ___________ b1 4.7409e+00 6.9258e-04 6.8453e+03 8.7363e-222 b2 1.1641e-02 1.7717e-03 6.5707e+00 5.5873e-09 Number of observations: 77, Error degrees of freedom: 76 Root Mean Squared Error: 0.0434 R-Squared: 0.208, Adjusted R-Squared 0.208 F-statistic vs. zero model: 41.9, p-value = 8.53e-09
R_N_v = linspace(1E-4, max(Rounded_N), 150).';
[y,yci] = predict(mdl, R_N_v);
[y1,y2] = bounds(y)
y1 =
0
y2 =
0
figure
hp{1} = semilogx(Rounded_N, AVG_Diameter, 'sb', 'DisplayName','Data');
hold on
hp{2} = plot(R_N_v, y, '-r', 'DisplayName','Regression');
hp{3} = plot(R_N_v, yci, '--r', 'DisplayName','95% Confidence Limits');
hold off
grid
legend([hp{1} hp{2} hp{3}(1)], 'Location','best')
xlabel('Rounded_N')
ylabel('AVG_Diameter')
.
  2 Comments
Behrooz Daneshian
Behrooz Daneshian on 13 Nov 2023
Edited: Behrooz Daneshian on 13 Nov 2023
Thank you for your answer. But it seems that you changed the position of "Rounded_N" and "AVG_Diameter". Can you explain why? shouldn't the frequency which is expressed here as "Rounded_N" array be on the vertical axis? I mean, "AVG_Diameter" and "Rounded_N" play as data and frequency, respectively.
Star Strider
Star Strider on 13 Nov 2023
I obviously read it incorrectly.
Reversing them from my previous code —
load("AVG_Diameter.mat")
load("Rounded_N.mat")
format shortE
Data = [AVG_Diameter Rounded_N]
Data = 77×2
1.0e+00 * 2.8075e-01 1.1500e+02 2.0803e-01 2.4800e+02 1.6366e-01 3.4000e+02 1.3501e-01 4.3500e+02 1.1341e-01 6.3100e+02 9.6017e-02 8.9600e+02 8.2264e-02 1.1190e+03 7.1687e-02 1.3090e+03 6.3259e-02 1.6310e+03 5.5454e-02 3.1870e+03
figure
semilogx(AVG_Diameter, Rounded_N, '.-')
grid
xlabel('Rounded\_N')
ylabel('AVG\_Diameter')
lognpdf_fcn = @(b,x) lognpdf(x,b(1),b(2));
[b,fv] = fminsearch(@(b) norm(Rounded_N - lognpdf_fcn(b,AVG_Diameter)), rand(2,1)*1E+3)
Exiting: Maximum number of function evaluations has been exceeded - increase MaxFunEvals option. Current function value: 847525920280383.000000
b = 2×1
-1.1039e+01 7.2101e-11
fv =
8.4753e+14
mdl = fitnlm(AVG_Diameter, Rounded_N, lognpdf_fcn, b)
Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.
mdl =
Nonlinear regression model: y ~ lognpdf(x,b1,b2) Estimated Coefficients: Estimate SE tStat pValue ___________ __________ ___________ __________ b1 -1.1039e+01 8.5088e-17 -1.2973e+17 0.0000e+00 b2 7.2101e-11 4.4659e-11 1.6145e+00 1.1056e-01 Number of observations: 77, Error degrees of freedom: 76 Root Mean Squared Error: 9.72e+13 R-Squared: -0.285, Adjusted R-Squared -0.285 F-statistic vs. zero model: 8.13, p-value = 0.0056
A_D_v = linspace(min(AVG_Diameter), max(AVG_Diameter), 150).';
[y,yci] = predict(mdl, A_D_v);
[y1,y2] = bounds(y)
y1 =
0
y2 =
0
figure
hp{1} = semilogx(AVG_Diameter, Rounded_N, 'sb', 'DisplayName','Data');
hold on
hp{2} = plot(A_D_v, y, '-r', 'DisplayName','Regression');
hp{3} = plot(A_D_v, yci, '--r', 'DisplayName','95% Confidence Limits');
hold off
grid
legend([hp{1} hp{2} hp{3}(1)], 'Location','best')
xlabel('AVG\_Diameter')
ylabel('Rounded\_N')
I still cannot get it to fit, however this should provide a starting point for your explorations. Again, choose a distribution with positive support, depending on whatever process underlies your data. (I thought a lognormal distribution would work, however it does not appear to be the correct choice.)
If you wnat to use histfit, give the function the original data — not previously-binned data — and see what it comes up with.
.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!