Matlab return zero's and NaN after a regression

Hello all,
I want to do a fitlm regression on my dataset. The dataset contains 80.000 rows and 30-35 columns. Running the first 2750 columns return a value for all the variables but running 2900 rows or more will return a '0' for all the variables and a NaN for the Tstat and the Pvalue. Does anyone had a clue what I am doing wrong?
Thanks in advance
This is the outcome for the first 2700 rows:
linear regression model:
rel_spr ~ 1 + post_dumm + isXBRL + size + leverage + Earnings_per_share + turnover
Estimated Coefficients:
Estimate SE tStat pValue
___________ __________ _______ __________
(Intercept) 0.026918 0.0020461 13.155 4.0497e-38
post_dumm -0.0016953 0.00094519 -1.7936 0.073016
isXBRL 0.0025597 0.0011885 2.1537 0.03137
size -0.0017311 0.0001673 -10.348 1.5309e-24
leverage -0.0037944 0.0025795 -1.471 0.14144
Earnings_per_share 6.3255e-09 1.2627e-08 0.50095 0.61645
turnover -0.00026829 0.00022523 -1.1912 0.23372
Number of observations: 2230, Error degrees of freedom: 2223
Root Mean Squared Error: 0.0216
R-squared: 0.0596, Adjusted R-Squared: 0.0571
F-statistic vs. constant model: 23.5, p-value = 4.93e-27
This is de outcome for 2900 rows or more:
Linear regression model:
rel_spr ~ 1 + post_dumm + isXBRL + size + leverage + Earnings_per_share + turnover
Estimated Coefficients:
Estimate SE tStat pValue
________ __ _____ ______
(Intercept) 0 0 NaN NaN
post_dumm 0 0 NaN NaN
isXBRL 0 0 NaN NaN
size 0 0 NaN NaN
leverage 0 0 NaN NaN
Earnings_per_share 0 0 NaN NaN
turnover 0 0 NaN NaN
Number of observations: 2278, Error degrees of freedom: 2278
Root Mean Squared Error: 0.0247
R-squared: NaN, Adjusted R-Squared: NaN
F-statistic vs. constant model: NaN, p-value = NaN

12 Comments

Your fit is already horrible in the first output. Can you attach the data (or a subsection) and the code you're using?
Please find in the new comment the files
Edit Rik, attached files and text below moved from an answer posted as comment:
Hi Rik,
Thanks for you reply. Attached the first 1000 rows of the file and the Matlab Code that I am using.
Your code needs two csv files, but you included a single Excel file. Try to make a MWE so we can run your code without any other dependencies and can reproduce your issue.
Sorry, forgot those files. Got an MWE for the CAPM file containing the first 1000 rows again.
Would you propose more rows so that the code does not work anymore?
If you want us to track down the issue, that would be better, yes.
Again, xlsx instead of csv. Details probably matter.
Thomas, I uploaded the files, and I get the error:
Undefined operator '/' for input arguments of type 'cell'.
when MATLAB tries to execute the line
big_tableX.PRC = big_tableX.PRC/100000;
because big_tableX.PRC is apparently a cell array. (I think it is possible that we have different default settings for reading in the table?) Simple conversion didn't work for me.
Possibly the easiest thing, which would avoid us worrying about the preprocessing steps, would be for you to upload a MAT file as your workspace exists just before fitlm is executed. Then all we need to do is load that MAT file, and run fitlm.
Agreed. Maybe even the release is causing a difference that makes it difficult to reproduce the issue.
Nick
Nick on 18 Oct 2021
Edited: Nick on 18 Oct 2021
Hi Thomas, did you ever find the reason for this outcome? I'm having the same issue too. Regression works with a smaller (clean) dataset but not with the (clean) entire set.
@Nick, I'd guess so, but he never came back here with the answer, so you'll have better luck if you post your question and your data in a new question.

Sign in to comment.

Answers (0)

Asked:

on 3 Oct 2019

Commented:

on 18 Oct 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!