How to get better test error/accuracy with neural networks?
Show older comments
Hi,
I am new to neural networks and I'm not sure how to go about trying to achieve better test error on my dataset. I have a ~20,000x64 dataset X with ~20,000x1 targets Y and I'm trying to train my neural network to do binary classification (0 and 1) on another dataset that is 19,000x64 to achieve the best results. I currently get about .175 MSE error rate on the test performance, but I want to do better. My dataset contains values in the range of -22~10000.
I used the neural networks toolbox and used its GUI to generate a script. I've modified some of the parameters like so:
inputs = X';
targets = Y';
hiddenLayerSize = 5;
net = patternnet(hiddenLayerSize);
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
net.divideFcn = 'divideblock'; % Divide data randomly
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
net.trainFcn = 'trainrp'; % Scaled conjugate gradient
net.performFcn='mse';
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
% View the Network
view(net)
I've read online and the Matlab documentation for ways to improve the performance, and it suggested that I do stuff like set a higher error goal, reinitializing the weights using the init() function, etc etc, but none that stuff is helping me to achieve better performance. Maybe I'm just not understanding how to do it correctly?
But anyways, can someone please direct me into some way in which I can achieve better accuracy? Also, could you please provide me with some code in your answer? I can't seem to understand much without looking at code.
Accepted Answer
More Answers (1)
Greg Heath
on 20 Mar 2014
0. I-H-O = 94-5-1 node topology; N =20,000 creation data pairs
1. Ntrneq = 0.7*N*O = 14,000 training equations but only
Nw =(I+1)*H+(H+1)*O = (94+1)*5+(5+1)*1 = 481 unknown weights
2. Probably need more hidden nodes( H > 5 ). Why was the default H = 10 replaced by H = 5 ?
3. Probably don't need I = 94 input dimensions or Ntrn = 0.7*N = 14K training examples.
4. 16 to 32 examples per dimension is probably sufficient. Since [16 32]*94 ~ [ 1500 3000], I would start with ~ 10 subsets of ~ 2000 for the following Tasks
a. Standardize inputs to zero-mean/unit-variance
b. Reduce input dimensionality to I < 94 via PLS (PCA and STEPWISEFIT not optimal for classification)
c. Use the reduced inputs to determine the smallest acceptable value of H by trial and error
Hope this helps
Thank you for formally accepting my answer
Greg
1 Comment
Terence
on 21 Mar 2014
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!