Increasing the number of hidden layers in a function fitting neural network seems to improve its performance (apparently without overfitting)
5 views (last 30 days)
Show older comments
Hello,
I am trying to solve a kinematic/dynamic mathematical problem, of two moving objects with the supervised function fitting neural network fitnet.
The network takes 5 INPUTS and gives 1 OUTPUT.
The initial step for me was to define the number of hidden layers and neutrons, so I did some research on papers, who tried to solve the same problem via a function fitting neural network and was surprised, that they had no answer on how to define the number of layers and neurons/layer. Everyone stated that they used "Informal Testing" and "Try&Error-Method" and expirmented with the number of layers until they found the results good enough.
This made me curious, so I tried to do at least some kind of "analysis" to this problem.
My strategie was to test the Number_of_Layers form 1 to 4 and the Number_of_Neurones from 1 to 20. But that means that there is 20^4 = 168.420 different ways the network layer/neurons-architecture could look.
So I tested all the 168.420 function-fitting networks and changed the Number of Neurones/Layers for each test and saved the RMSE from the test set.
Most important properties of the network:
fitnet:
adaptFcn: 'adaptwb'
adaptParam: (none)
derivFcn: 'defaultderiv'
divideFcn: 'dividerand'
divideParam: .trainRatio, .valRatio, .testRatio
divideMode: 'sample'
initFcn: 'initlay'
performFcn: 'mse'
performParam: .regularization, .normalization
plotFcns: {'plotperform', plottrainstate, ploterrhist,
plotregression, plotfit}
plotParams: {1x5 cell array of 5 params}
trainFcn: 'trainlm'
%% For each individual Layer:
initFcn: 'initnw'
netInputFcn: 'netsum'
transferFcn: 'tansig'
I trained the network with 8322 datasamples which were divied as following:
net.trainRatio = 70/100;
net.valRatio = 25/100;
net.testRatio = 5/100;
My first guess was, that the network performance would decrease with increasing number of layers and neurons/layer, but the opposite was the case. The more total Neurones the better was the Network.
The following plot shows the test-set-RMSE (solid line) and the validation-set-RMSE (dotted line) for examples with networks with evenly distributed neurons
(e.g. : 1
3-3
2-2-2
16-16-16-16
etc...)
The most interesting part is, that the Network with the least test_RMSE had 4 Layers with [8 17 15 3] neurons.
Then I took some other examples with >10 neurons in the first layer, <15 neurons in Layers two and three and <5 neurons in Layer four.
They all showd significanlty better results than networks with evenly distributed Neurons per layer.
I could not find an explanation for this phenomenon (few neurons in 1st and last layer, many neurons in the inbetween layers) yet.
I am now very curious if anyone has an explanation or at least experienced the same phenomenon. Thanks in advance!
0 Comments
Answers (1)
Krishna
on 30 May 2024
Hey,
It seems like you're enhancing your model by adding more neurons, either through increasing the number of hidden layers or the number of neurons within those layers, and you've observed a continuous decrease in the training MSE value. It's important to remember that adding more hidden layers and neurons typically leads to a decrease in the training RMSE value. However, it's crucial to also consider validation and test data to ensure your network isn't overfitting to the training data.
If you notice that the RMSE value for your test dataset continues to decrease, it indicates your network is still improving. However, there will come a point where the test RMSE value starts to plateau or saturate, showing no further significant changes, and eventually, it may begin to increase, which suggests your model has become overtrained.
Determining the exact number of neurons and hidden layers required isn't straightforward. A good starting point could be using 2 hidden layers and (2/3 * (input + output) neurons in each hidden layer with a 'tanh' activation function, as it's capable of approximating any nonlinear function. From there, it's a matter of trial and error to see how adjustments affect your test RMSE results.
The aim of trial and error should be reach an optimal performance with least complexity of the network.
Please go through this article to learn more about how to select number of neurons and hidden layer,
Hope this helps.
0 Comments
See Also
Categories
Find more on Define Shallow Neural Network Architectures in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!