How to apply PCA to training/val/test data

Hello. I am working on a neural network that predicts anxiety using heart rate and heart rate variation (supervised machine learning). I have 232 samples and a total of 17 features. In the future, I want to be able to add samples and use this NN to predict anxiety for those. Obviously that is too many features, so I want to do feature reduction using PCA, but as much as I read, I am still a little confused.
I applied PCA to the dataset and 94% of the variation is in 8 components, but that was for the whole dataset. I know that I need to calculate PCA for the training and then do a thing for the validation and test data but I am so confused as to what that thing is and how to do it. Then redo that thing for new samples in the future. If someone could spoonfeed me an answer I would be so happy.

3 Comments

Greg Heath
Greg Heath on 12 Jan 2018
Edited: Greg Heath on 12 Jan 2018
1. Nomenclature: Delete the word cross.
validation and cross-validation are two separate animals.
Greg
Ok. What then?
I think you need to use the same PCA model, which determined with your training data on the test and cross validation data.
Lets say you used following code:
[pcs,scrs,~,~,explained,mu] = pca(data);
%here you can use your first 8 components in scrs (with regard to the %explained variance in explained)
%Then use the model for your training and cross val. data (datatrain)
%and take here again the first 8 components
datatrain1=(datatrain-mu)/pcs';
% Can anybody confirm this code??

Sign in to comment.

Answers (0)

Categories

Asked:

on 12 Jan 2018

Edited:

on 5 Feb 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!