Clear Filters
Clear Filters

Why does my fit to a PDF generated using a histogram not add up to 1 or give me correct expected value?

5 views (last 30 days)
Hello,
I generate a pdf from a histogram which describes the distribution of values about an average. For example say I have values {x1,x2,....xn}, my PDF is to see how x'={x1,x2,....xn}/mean({x1,x2,....xn}) is distributed.
The PDF of x' (blue curve) i generated is shown below where I fit a yellow quadratic function to whatever is below the mean <1 and with an exponential fit to whatever is >1.If I sum the blue curve:
(sum(prob))
Unrecognized function or variable 'prob'.
it is 1 which makes sense since probabilities should add up to 1. In addition, if I do:
sum(x'.*prob)
for the distribution, I get 1 which makes sense because my expected value should be 1. However, If I try to do the same with my curve fits, using trapezoidal integration, I do not get 1! I don't understand what is missing.. am I not normalizing a quantity?
The full code snippet I used to generate the above figure is attached below. I have also attached the data to this post if you might be interested in taking a look. My expected value from the discrete distribution given by the variable 'discrete' is nearly 1. but the 'continuous' variable when computed is like 0.18 which is unexpected. Any help is much appreciated!
data = [data_tot_fn_4_0p05/mean(data_tot_fn_4_0p05)]; %# Sample data
xRange = 0:0.2:max(data); %# Range of integers to compute a probability for
N = hist(data,xRange); %# Bin the data
prob=N./numel(data);
semilogy(xRange,prob); %# Plot the probabilities for each integer
xlabel('Integer value');
ylabel('Probability');
greater1=find(xRange>=1);
xgreater1=xRange(greater1)
ygreater1=prob(greater1);
less1=find(xRange<=1);
xless1=xRange(less1)
yless1=prob(less1);
model = @(a, x) exp(-a*(x));
% Initial guess for the parameter 'a'
initial_guess = 0.5;
% Perform the curve fitting
a_fitgreat = lsqcurvefit(model, initial_guess, xgreater1, ygreater1);
model2=@(p,x)p(1)+p(2)./(p(3).*sqrt(pi/2)).*exp(-2*(x-p(4)).^2./p(3).^2)
model2=@(p,x)p(1)+p(2).*x+p(3).*x.^2;
initial_guess = [0.5,0.5,0.5,0.5];
initial_guess=[0.5,0.5,0.5];
% Perform the curve fitting
a_fitless = lsqcurvefit(model2, initial_guess, xless1, yless1);
% Calculate the fitted curve using the optimized 'a' value
y_fit_great = model(a_fitgreat, xgreater1);
y_fit_less = model2(a_fitless, xless1);
hold on;
plot(xgreater1,y_fit_great);
hold on;
plot(xless1,y_fit_less)
discrete=sum(xRange.*prob)
continuous=trapz(fofaveless,(fofaveless).*yfofaveless)+trapz(fofavegreat,(fofavegreat).*yfofavegreat);

Answers (1)

the cyclist
the cyclist on 3 May 2024
I can't run your code to completion, because fofaveless is not defined, so I can't calculate the variable continuous.
But, it looks to me that you cut off that calculation -- the yellow line -- at x=1, but the full range looks to extend to at least 3. Do you get closer to what you expect if you use
less1=find(xRange<=3);
instead of
less1=find(xRange<=1);
?

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!