I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

Question

PEDRO ALEXANDRE Fernandes on 2 Nov 2023

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/2042031-i-wanted-to-apply-the-chi-squared-function-with-the-return-of-the-p-value-but-matlab-s-chi2cdf-func

Edited: dpb on 3 Nov 2023

% Example data matrix (2000 rows and 9 columns)
data_matrix = randi([0, 10], 2000, 9); % Replace this with your actual data
% Example empirical frequency (a vector with 9 elements)
empirical_frequency = [10, 20, 30, 40, 50, 60, 70, 80, 90]; % Replace this with your actual empirical frequency
% Initialize vectors to store results
chi_squared_results = zeros(2000, 1);
p_values = zeros(2000, 1);
for i = 1:2000
    % Select the data for row i
    row_i = data_matrix(i, :);
    
    % Calculate the chi-squared statistic manually
    chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency);
    
    % Determine the degrees of freedom (df)
    df = length(row_i) - 1;
    
    % Calculate the p-value using the chi-squared distribution
    p = 1 - chi2cdf(chi_squared, df);
    
    % Store the results in vectors
    chi_squared_results(i) = chi_squared;
    p_values(i) = p;
end
unique(p_values)
ans = 0

The problem is that chicdf return 0.

1 Comment
Show -1 older commentsHide -1 older comments

PEDRO ALEXANDRE Fernandes on 2 Nov 2023

Yes that is the problem

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 2 Nov 2023

2
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/2042031-i-wanted-to-apply-the-chi-squared-function-with-the-return-of-the-p-value-but-matlab-s-chi2cdf-func#answer_1345501

Edited: dpb on 2 Nov 2023

Open in MATLAB Online

% Example data matrix (2000 rows and 9 columns)

data_matrix = randi([0, 10], 2000, 9); % Replace this with your actual data

% Example empirical frequency (a vector with 9 elements)

empirical_frequency = [10, 20, 30, 40, 50, 60, 70, 80, 90]; % Replace this with your actual empirical frequency

% Initialize vectors to store results

chi_squared_results = zeros(2000, 1);

p_values = zeros(2000, 1);

for i = 1:2000

% Select the data for row i

row_i = data_matrix(i, :);

% Calculate the chi-squared statistic manually

chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency);

% Determine the degrees of freedom (df)

df = length(row_i) - 1;

% Calculate the p-value using the chi-squared distribution

p = 1 - chi2cdf(chi_squared, df);

% Store the results in vectors

chi_squared_results(i) = chi_squared;

p_values(i) = p;

end

histogram(chi_squared_results)

%unique(p_values)

[min(chi_squared_results) max(chi_squared_results)]

ans = 1×2

321.3518 421.4471

chi2cdf(ans, df)

ans = 1×2

1 1

What would you expect when compare a random vector from 1:10 against an expected cumulative distribution frequency of 10:10:100?

As the above indicates, the minimum ch-square statistic calculated was 323; that's so far from being within the range of a realistic test statistic the actual percentage less than unity underflows the precision of a double and so is returned as identically 1. Try something more like

row_i=randi([0, 100], 1, 9)  % test vector between 0-100 instead 0-1
row_i = 1×9
    97    27     9    47    15    34    43    54    38
chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)
chi_squared = 859.9504
p = 1 - chi2cdf(chi_squared, df)
p = 0

That's still way out of reason; by chance for the given vector the essentially full cdf value turned out to be in the first element; not exactly surprising it ends up with identically zero estimate.

Now, keep the same vector but sort it to get what could be an approximation to a cdf...

row_i=sort(row_i)
row_i = 1×9
     9    15    27    34    38    43    47    54    97
chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)
chi_squared = 26.7983
p = 1 - chi2cdf(chi_squared, df)
p = 7.6598e-04

Now, the above random vector starts out not too bad in comparison to exected, with several quite low values in the 50:80 range that make it not fit all that well--but at least it's computable.

figure
plot(empirical_frequency,sort(row_i))
xlabel('Expected','Observed')

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 2 Nov 2023

Edited: dpb on 3 Nov 2023

Open in MATLAB Online

I didn't want to ruin the great for illustration random vector created last run above so I didn't actually rerun to plot the observed versus expected...

row_i=[97 27 9 47 15 34 43 54 38];

empirical_frequency=[10:10:90];

subplot(3,1,1)

hold on

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,row_i,'b*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','random','location','north')

subplot(3,1,2)

hold on

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,sort(row_i),'r*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','sorted','location','northwest')

subplot(3,1,3)

hold on

row_i=[9 15 27 34 53 57 78 78 97 ];

chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)

chi_squared = 4.3887

df=numel(row_i)-1;

p = 1 - chi2cdf(chi_squared, df)

p = 0.8205

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,row_i,'g*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','adjusted','location','northwest')

Now if in the end we take a set of data that actually do follow roughly the path of the empirical cdf, then, by golly, we get a chi-square statistic that actually indicates that set of observations couldn't really be ruled out as having come from the parent distribution. As noted, the "corrections" made to the random vector were to raise the 5th thru 8th values up to some values that were roughly in line...then the deviations from empirical weren't nearly so large...

Sign in to comment.

I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments