how to solve overtrained nn with validation stop?
2 views (last 30 days)
Show older comments
I have 365 dataset related to 365 days. I split datas into 7 days as inputs and 8th day as target to train FeedForwardNet.I test lots of different neuron numbers and all of them were overtrain and for example stops at epoch # 16 with validation stop. I know my network overtrain but I got confused how to overcome it. any suggestion would be highly appreciated.
0 Comments
Answers (1)
Greg Heath
on 26 Apr 2016
Edited: Greg Heath
on 26 Apr 2016
You cannot over-train unless the net is over-fit.
The net is over-fit if there are more unknown parameters than there are training equations.
If you accept all default parameters except No. of hidden nodes, H, then the number of unknown parameters for a single hidden layer net with I-H-O node topology is just the number of unknown weights which, for I-dimensional "i"nput vectors and O-dimensional "o"utput target vectors, is
Nw = (I+1)*H+(H+1)*O
With N pairs of input/target training vectors, the default number used for training is
Ntrn = N - 2*round(0.15*N) % ~0.7*N
The corresponding number of training equations is
Ntrneq = Ntrn*O
Therefore, the condition for not over-fitting is
Ntrneq >= Nw = O + (I+O+1)*H
or equivalently,
H <= Hub = (Ntrneq-O)/(I+O+1)
where Hub is the lowest upper-bound.
If you need more that Hub hidden nodes there are two methods that mitigate over-fitting;
1. Validation stopping. The default data-division
yields
Nval = Ntst = round(0.15*N).
2. Bayesian regularization to prevent weights from
getting too large. This replaces the mse training
performance function with the regularization
combination
msereg = mse(error) + alpha* mse(weights)
alpha is automatically determined by the training algorithm.
The two ways to implement this are
a. net.trainFcn = trainbr
b. net.performFcn = msereg
Now, to answer your question. I don't know if you have overfit the net. However, valstop has prevented you from improving the training performance at the expense of nontraining performance.
Don't despair.
My approach is to design multiple nets in a double loop over number of hidden nodes (Hmin:dH:Hmax) with an inner loop over Ntrials random number states which dictate the random data-division and random initial weight assignments. Typically Ntrials = 10 and numH <= 10.
I have posted tens of examples. Searching
Hmin:dH:Hmax Ntrials
yields 70 hits.
Hope this helps.
Thank you for formally accepting my answer
Greg
0 Comments
See Also
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!