Removing NaN in Linear Regression Problem. Error in line 66.

20 views (last 30 days)
Hello guys,
I am trying to conduct a multivariable linear regression problem. The predictors (X) form a table sized 52824x9.
When trying to remove all the NaN values using this piece of code, included in the regress function:
% Remove missing values, if any
wasnan = (isnan(y) | any(isnan(X),2)); %line 66
havenans = any(wasnan);
if havenans
y(wasnan) = []; %line 69
X(wasnan,:) = [];
n = length(y);
end
At first, I got an error stating:
Undefined function 'isnan' for input arguments of type 'table'.
Error in regress (line 66)
wasnan = (isnan(y) | any(isnan(X),2));
I searched for solutions, and I was able to find one saying that isnan function is not able to access data from tables, and the provided solution was to include the following:
wasnan = (isnan(y{:,:}) | any(isnan(X{:,:}),2));
Now I get an error in line 69 saying the following:
Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not
supported. Use a row subscript and a variable subscript.
If anyone knew how to solve the problem or to provide another solution for accessing data with the isnan function, it would be very much appreciated. I have been trying to solve this problem for some days now.
Many thanks,
Natalia

Accepted Answer

dpb
dpb on 18 Mar 2020
Edited: dpb on 20 Mar 2020
You don't pass the table to regress but the variables to be used in the regression -- then you won't run into the issue inside regress.
And you DEFINITELY DO NOT WANT TO BE MUCKING INSIDE THE SUPPLIED REGRESS FUNCTION!!!!
We don't know the function you're trying to fit nor the variable names in your table, but assuming
Y ~ 1 + AX1 + BX2 + ...
for variables X and Y in the table and a linear model plus intercept, then the syntax for regress would be
b=regress(t.X,[ones(height(t),1) t.Y]);
where the table variable is t. Use your table variable name and variable names within the table, of course.
If you have the Curve Fitting Toolbox besides Statistics, I would suggest that the fit function in it is a little more user friendly than the core regress function. Lacking it, see the Alternative Functionality section of the documentation for regress that suggests using LinearModel instead for similar reasons/purposes.
Read the section in the documentation for table on how to address data within a table for the details of using tables and which forms of addressing return the variables as native type, tables, etc., ... But, in particular note that addressing a table variable with parentheses returns another table of the addressed rows and columns within the table which is probably the root cause of your troubles.
x=t(:,1); % returns x as a table all rows of table t, column 1
while
x=t.X; % presuming X is the first column in table t returns X as an array
% or
x=t{:,1}; % returns x as a array -- NB: the "curlies" {} instead of ()
  1 Comment
NATALIA ARREGUI GONZALEZ
NATALIA ARREGUI GONZALEZ on 19 Mar 2020
Many thanks, your insight was very helpful. I was able to solve it using LinearModel, following your advice.
Best,
Natalia

Sign in to comment.

More Answers (1)

Cris LaPierre
Cris LaPierre on 18 Mar 2020
Try using rmmissing. It accepts vectors, matrices, cell arrays, tables, and timetables as input.
  5 Comments
Cris LaPierre
Cris LaPierre on 18 Mar 2020
Ah, I didn't realize that code snippet was from the regress function. Yes, don't go changing code inside the function. Use this to clean up your table before passing it to regress.
And yes, regress does not support tables as inputs. Use the dot notation to pass in variables.
NATALIA ARREGUI GONZALEZ
NATALIA ARREGUI GONZALEZ on 19 Mar 2020
Many thanks for your answer as well, I will keep this in mind from now on.
Best,
Natalia

Sign in to comment.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!