effect by splitting data in training neural network
Show older comments
Hi, I would like to know the effect by splitting data when training neural network.
When I use 504x20000 training data X (20000 samples,each of which has 504dims) to train NN, what if I split them into, say, 100 parts and iterating "train" function?
Compared to training using all data, what would be the result? I mean like this;
for n = 1:100
dataind = (n-1)*200+1:n*200;
net = train(net, x(:,dataind),y(:,dataind));
end
instead of training all the data at one "train" function like;
dataind = 1:20000
net = train(net,x(:,dataind),y(:,dataind));
I'm asking this because of the memory problem when I attempted to train NN using all the data. Thank you in advance.
Accepted Answer
More Answers (1)
Greg Heath
on 14 Sep 2016
0 votes
Sorry, my first answer, though interesting, does not directly address your basic problem.
1. Distributions are well represented when the number of datapoints, N, is sufficiently larger than their dimension, I. A rule of thumb in statistics is N > 30*I. However, sometimes a factor of 10 or 15 is sufficient. Your data has a ratio ~ 40. So there is no problem there. In fact you might consider reducing N.
2. However, before reducing N, consider the fact that I = 504 might be ridiculously large. So, I think your 1st order of business is to try to reduce I. There are a variety of ways to do this. I do not favor PCA because it replaces inputs with linear combinations. Therefore, it is difficult to ascertain which inputs are the most important.
3. Determining the best inputs is easier to determine if BOTH INPUTS & TARGETS are standardized to zero-mean/unit-variance (ZSCORE or MAPSTD)
4. I would be satisfied with ranking inputs w.r.t. their weights in a LINEAR MODEL when BOTH input and output are standardized. MATLAB's PLSREGRESS might be a good choice.
5. Rather than trying to use all of your data to train one classifier, design an ensemble of nets and either average the outputs or design a linear classifier with the outputs.
Hope this helps.
Thank you for formally accepting my answer
Greg
Categories
Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!