Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

Question

qfn on 28 Jan 2022

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1638475-fixed-effects-design-matrix-must-be-of-full-column-rank-with-multiple-categorical-predictors

Commented: Laurie König on 28 Nov 2024

I am probably doing something very dumb, however I cannot figure out my mistake.

I am trying to regress out some predictors from a data set -- I have two categorical predictors, A1 and A2 in a table, something like this:

It seems obvious to me that A1 and A2 are linearly independent. They are also linearly independent from the intercept, which I believe should be a categorical variable that looks like ones(1,11) ? But regardless, I want the global mean to not be removed from everything, so I don't include an intercept in the model.

Then, if I run something like this:

lme = fitlme('values ~ A1 + A2 -1, 'DummyVarCoding','full' )

I always get the same error :

Error using classreg.regr.lmeutils.StandardLinearLikeMixedModel/validateInputs (line 229)

Fixed Effects design matrix X must be of full column rank.

I don't understand why this is happening -- and probably this shows that I have a pretty big misunderstanding of what the dummy variables actually are.

However, if I run two fitlme's -- one on the subset A1==1 and one on A1==0, they both work, which just super confuses me.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Ive J on 29 Jan 2022

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1638475-fixed-effects-design-matrix-must-be-of-full-column-rank-with-multiple-categorical-predictors#answer_884350

Open in MATLAB Online

The error is self-explanatory, and the reason is full dummy variable scheme you're using (why?). See here https://mathworks.com/help/stats/dummy-indicator-variables.html

Note that the error has nothing to do with mixed-model design. Consider this example:

n = 100; % sample size
tab = table(randn(n,1), categorical(randi([0 1], n, 1)), ...
    categorical(randi([0, 1], n, 1)),...
    'VariableNames', {'value', 'A1', 'A2'});
mdl1 = fitlm(tab, 'value ~ A1 + A2 - 1', 'DummyVarCoding', 'full') % design matrix is rank deficient
Warning: Regression design matrix is rank deficient to within machine precision.
mdl1 = 
Linear regression model:
    value ~ A1 + A2

Estimated Coefficients:
            Estimate       SE        tStat      pValue 
            _________    _______    ________    _______

    A1_0     -0.20234    0.20399    -0.99191    0.32373
    A1_1            0          0         NaN        NaN
    A2_0    -0.045804    0.17202    -0.26627     0.7906
    A2_1     0.097693    0.18145     0.53839    0.59155


Number of observations: 100, Error degrees of freedom: 97
Root Mean Squared Error: 1.02
R-squared: 0.0145,  Adjusted R-Squared: -0.00585
F-statistic vs. constant model: 0.712, p-value = 0.493

So, what happened? Let's construct the design matrix:

X = [dummyvar(tab.A1), dummyvar(tab.A2)]; % DummyVarCoding -> full
disp(rank(X)) % 3 < size(X, 2) --> 3 < 4  --> rank deficient
     3
% what about when considering them alone?
disp(rank(X(:, 1:2))) % full rank
     2
disp(rank(X(:, 3:4))) % full rank
     2

We can approximately find the problematic variable:

[~, R] = qr(X, 0);
find(abs(diag(R)) < 1e-6)
ans = 4

Therefore, don't set 'DummyVarCoding' in such cases (default is 'reference')

1 Comment
Show -1 older commentsHide -1 older comments

Laurie König on 28 Nov 2024

Open in MATLAB Online

Hi there, may I ask a followup question? I am running into a similar problem. I am also having two categorical predictors, but with three groups (0,1,2). However, I have included them as categorical variables in the equation which leads to reference coding. My variables are called word_cat and attribute.

When I run the regression model, I see the folllowing output. Could you give me a hint towards why parameters can be estimated for one reference group and not the other even though all 3 groups are present in the data and the two predictors are not correlated?

word_cat_1                 -31.78     3.6778    -8.6411     3.0585e-17
word_cat_2                 -15.24     3.6778    -4.1438     3.7843e-05
attribute_1                -28.71     3.6778    -7.8063      1.866e-14
attribute_2                  1.49     3.1851    0.46781        0.64005
word_cat_1:attribute_1      50.81     5.2012      9.769     2.3292e-21
word_cat_2:attribute_1          0          0        NaN            NaN
word_cat_1:attribute_2          0          0        NaN            NaN
word_cat_2:attribute_2      30.46     4.8653     6.2607     6.2802e-10

Sign in to comment.

Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments