What is the difference between observation and variable in case of pca() function?

I have a matrix with m rows and n columns that is built from n number of individuals for person identification. So, n is the number of person and m is the number of feature's value of the person or pixel's intensity values of the person's image.
Now I want to calculate pca using Matlab's pca() function. But, It makes me confused about observation and variables. What will I call n and m? Which one represents observation and which one represents variable?

 Accepted Answer

From the documentation:
  • coeff = pca(X) returns the principal component coefficients, also known as loadings, for the n - by - p data matrix X. Rows of X correspond to observations and columns correspond to variables.
So each row is an observation and each column is a variable.

8 Comments

thanks for your answer. I know that. But my confusion is, which one is observation and which one is variable for my data matrix? In my case "I have a matrix with m rows and n columns that is built from n number of individuals for person identification. So, n is the number of person and m is the number of feature's value of the person or pixel's intensity values of the person's image.">>> What is observation and What is variable? n or m?
To correctly use pca and other Statistics and Machine Learning Toolbox with your matrix, transpose it.
Thanks. but I want to know about observation and variable. please see the following picture:
Now my question is, which one is observation (column (n) or row (m)) and which one is variable (column (n) or row (m))?
The variables are ‘m’ and the obvservations are ‘n’ in your scheme. To use them with the Statistics and Machine Learning Toolbox functions, you have to transpose your matrices.
I think so. Would you see Image Analyst answer, please? It makes me more confusion!
I think so as well.
I did, and it seems to be using an example that may or may not be appropriate to your data, since we do not know what your data are, or what the variables represent.
Thanks, I want to use the PCA to reduce dimensions of the data. After that, it will be used for classification
My pleasure.
Principal components analysis is quite good for that. I actually once used it to design a filter when I did a linear discriminant analysis on short-time Fourier transforms of EEGs to classify what task the person was doing. The filter then isolated the relevant frequencies. making the classification much more efficient.

Sign in to comment.

More Answers (1)

The person is treated as a variable and the feature value is the observation. The feature values for each person are listed in a column of the table. For example, maybe you have two people in a family and want to see if there is a relationship between the weight of the two people over time. You might be taking measurements every day (or month or whatever). So if you had 4 time points and 2 people, the array would be
w11, w21
w12, w22
w13, w23
w14, w24
where w1* are the weights of person #1, and w2* are the weights of person #2. The coefficients would be
coefficients = pca(weightsMatrix);
Where weightsMatrix is that 2-D array I gave above.

2 Comments

It makes me more confusion. my understanding is people/persons are observations and their feature values are variable. Please see this link: https://en.wikipedia.org/wiki/Data_matrix_(multivariate_statistics)
Please describe the measurements you are making. If the ID of persons is your observation then you may need logistic regression instead of PCA.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!