Clear Filters
Clear Filters

Using and Reading PCA Statistical Results

2 views (last 30 days)
balsip
balsip on 8 Jun 2017
Answered: Shivansh on 26 May 2024
Using pca() with a 7627x17 matrix of variables (columns = variables, rows = observations), I am outputting the coeff and explained as the pca() help page details.
Question 1: Does the input matrix require the last column to be the response/dependent variable (like stepwiselm() and fitlm())? If no, how are these results being calculated without the response/dependent variable involved?
Question 2: Am I right in this interpretation of the results below by saying the third predictor variable comprises ~0.995 of the influence of the first principle component, which explains 88.6% of the predictor variable variance? If that's correct, then how is this newfound info useful in explaining my response/dependent variable?
Somebody set me straight, please. Ultimately, my goal is to determine which original predictor variables explains the response variable the best, and then quantify that. My thought was to take the most influential variables as determined by PCA and then build the model using stepwiselm() or fitlm().
coeff:
0.00129729002558727 0.289549385373420 0.956728609103309 -0.0212665364089570
-0.0953914970117933 0.952329812960381 -0.287992542335858 0.0187816335691187
0.995206207590380 0.0910468102726022 -0.0294065217015872 -0.0186854868976485
-0.000397082225762763 -0.00246282907057587 0.00447608503472252 -0.0110332572889799
0.00322274954398379 -0.0226606357739981 -0.00261825636192142 0.0575596465344526
-0.000101350632956080 -6.19964461638700e-06 -1.49483017598818e-05 -5.08226633593425e-05
0.00852263907323397 0.0165679978418983 0.00865470258457646 0.128426646955215
0.00104561882463731 -0.00110133731797852 0.00439471539229791 -0.000299508982820549
0.00121992923377658 -0.000898286153472998 0.00374785080734514 0.00364889452050408
0.00119226703173668 -0.00114898617310396 0.00212207512452650 0.00644443091048378
0.000748542020111456 -0.00201026790580145 -0.00109958052522231 0.00763259394674056
-0.000523073666889972 -0.00286022621479378 0.00406153294859059 0.00776832543962255
-0.000417086918391104 -0.00115684825783806 0.00631626569801059 0.00530098022903176
0.000918261002587542 0.00258292617487317 0.00785296732106662 0.00108255503976634
0.000864124551517556 0.00145714217693526 0.00281347075801307 -0.00354952384903065
-9.23014821423366e-05 0.000388571081226511 0.000173062207881905 -0.00110996732336648
0.0193283447001054 -0.0109382094403551 0.0244754518699291 0.989293153560380
explained:
88.6118904456048
8.77262546524695
2.07011007727990
0.437152292606406
0.0639436168558254
0.0223883103766897
0.0106035177147705
0.00808530454905423
0.00161881588630160
0.000820663786317213
0.000409453989136154
0.000267674592543745
7.93610602222986e-05
4.25607722914739e-06
5.80587204580441e-07
1.42789942911271e-07
2.09966934592139e-08

Answers (1)

Shivansh
Shivansh on 26 May 2024
Hi Balsip!
PCA (Principal Component Analysis) does not require the last column to be the response or dependent variable. PCA is an unsupervised learning technique used primarily for dimensionality reduction or to identify patterns in data based on the correlation between features.
For the second part, your interpretation is correct that the third predictor significantly influences the first principal component, accounting for 88.6% of the variance in predictors. However, PCA does not directly relate predictors to a response variable. For understanding which predictors best explain the response, use influential variables from PCA in regression models like "stepwiselm" or "fitlm".
You can refer to the following documentations for more information:
  1. PCA: https://www.mathworks.com/help/stats/pca.html.
  2. stepwiselm: https://www.mathworks.com/help/stats/stepwiselm.html.
  3. fitlm: https://www.mathworks.com/help/stats/fitlm.html.
I hope it helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!