How to perform a linear model for effects of categorical variables on a numeric variable?
Show older comments
Hi all!
I have the following table (table1) with 68 columns and 100 rows containing numerical and categorical data. I'm only interested in the numeric variable 'Height' and the categorical variables 'Sex', 'Age' and 'Treatment'.
load table1
% Let's summarise the content of the variables I'm interested in:
% Sex variable: Female and Male
% Age variable: Prepuberal and Adult
% Treatment variable: Drug and Control
% Height variable: Randomly assigned numerical values
I want to perform a linear model to look for 'Sex', 'Age' and 'Treatment' effects on 'Height'. Note that 'Sex', 'Age' and 'Treatment' are categorical variables, and the response variable 'Height' is numeric.
% Linear model of sex, age and treatment effects on Height
mdl = fitlm(table1,'Height~Age+Treatment+Sex')
Then, I want to perform linear models to look for interaction effects between 'Sex*Age', 'Sex*Treatment', 'Age*Treatment', 'Sex*Age*Treatment' on 'Height'.
mdlsexandage = fitlm(table1,'Height~Sex*Age')
mdlsexandtreatment = fitlm(table1,'Height~Sex*Treatment')
mdlageandtreatment = fitlm(table1,'Height~Age*Treatment')
mdlsexandageandtreatment = fitlm(table1,'Height~Sex*Age*Treatment')
Am I doing this right? How do I interpret the resulting models?
Note that table1 was randomly generated, so I don't expect p values making sense. I am only interested in learning to code the linear models, thereby interpreting them :)
Thanks for sharing your knowledge, you all are always helpful!
Accepted Answer
More Answers (1)
Hitesh
on 14 Oct 2024
Yes, your approach to fitting linear models in MATLAB using the fitlm function is correct. You can use "disp" function to get the summary of the each model which will contain estimated coefficients such as tStat and pValues.
disp(mdlsexandage);
Interpretation:
- Coefficients: The output will include coefficients for each level of the categorical variables. These coefficients represent the change in 'Height' relative to the reference level.
- P-values: Indicate the statistical significance of each predictor. Since the data is randomly generated, these p-values may not be meaningful in your case.
- R-squared: Provides a measure of how well the model explains the variability of the response data.
Comparison of models can be done using metrics like adjusted R-squared to determine which model best fits the data.If interaction terms are significant, it suggests that the relationship between the predictors and the response variable is not simply additive. Instead, the effect of one predictor depends on the level of another predictor.
For more information about "disp" function, refer to the below MATLAB documentation:
Categories
Find more on Descriptive Statistics and Visualization in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!