Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors
41 views (last 30 days)
Show older comments
I am probably doing something very dumb, however I cannot figure out my mistake.
I am trying to regress out some predictors from a data set -- I have two categorical predictors, A1 and A2 in a table, something like this:

It seems obvious to me that A1 and A2 are linearly independent. They are also linearly independent from the intercept, which I believe should be a categorical variable that looks like ones(1,11) ? But regardless, I want the global mean to not be removed from everything, so I don't include an intercept in the model.
Then, if I run something like this:
lme = fitlme('values ~ A1 + A2 -1, 'DummyVarCoding','full' )
I always get the same error :
Error using classreg.regr.lmeutils.StandardLinearLikeMixedModel/validateInputs (line 229)
Fixed Effects design matrix X must be of full column rank.
I don't understand why this is happening -- and probably this shows that I have a pretty big misunderstanding of what the dummy variables actually are.
However, if I run two fitlme's -- one on the subset A1==1 and one on A1==0, they both work, which just super confuses me.
0 Comments
Answers (1)
Ive J
on 29 Jan 2022
The error is self-explanatory, and the reason is full dummy variable scheme you're using (why?). See here https://mathworks.com/help/stats/dummy-indicator-variables.html
Note that the error has nothing to do with mixed-model design. Consider this example:
n = 100; % sample size
tab = table(randn(n,1), categorical(randi([0 1], n, 1)), ...
categorical(randi([0, 1], n, 1)),...
'VariableNames', {'value', 'A1', 'A2'});
mdl1 = fitlm(tab, 'value ~ A1 + A2 - 1', 'DummyVarCoding', 'full') % design matrix is rank deficient
So, what happened? Let's construct the design matrix:
X = [dummyvar(tab.A1), dummyvar(tab.A2)]; % DummyVarCoding -> full
disp(rank(X)) % 3 < size(X, 2) --> 3 < 4 --> rank deficient
% what about when considering them alone?
disp(rank(X(:, 1:2))) % full rank
disp(rank(X(:, 3:4))) % full rank
We can approximately find the problematic variable:
[~, R] = qr(X, 0);
find(abs(diag(R)) < 1e-6)
Therefore, don't set 'DummyVarCoding' in such cases (default is 'reference')
1 Comment
Laurie König
on 28 Nov 2024
Hi there, may I ask a followup question? I am running into a similar problem. I am also having two categorical predictors, but with three groups (0,1,2). However, I have included them as categorical variables in the equation which leads to reference coding. My variables are called word_cat and attribute.
When I run the regression model, I see the folllowing output. Could you give me a hint towards why parameters can be estimated for one reference group and not the other even though all 3 groups are present in the data and the two predictors are not correlated?
word_cat_1 -31.78 3.6778 -8.6411 3.0585e-17
word_cat_2 -15.24 3.6778 -4.1438 3.7843e-05
attribute_1 -28.71 3.6778 -7.8063 1.866e-14
attribute_2 1.49 3.1851 0.46781 0.64005
word_cat_1:attribute_1 50.81 5.2012 9.769 2.3292e-21
word_cat_2:attribute_1 0 0 NaN NaN
word_cat_1:attribute_2 0 0 NaN NaN
word_cat_2:attribute_2 30.46 4.8653 6.2607 6.2802e-10
See Also
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!