does anybody ever use neural network to do a prediction with regularization optimization instead of early stopping?
Show older comments
Hi, everyone, I am trying to train a neural network (NN) for prediction, to prevent overfitting, I chose to use regularization method for optimization, so I chose 'trainbr' as the training function, and 'msereg' as the performance function. The input and output data are preprocessed to constrain them to be within [-1,1]. And the data is divided into 2 groups randomly, one for training (70%), one for testing (30%).
Below is part of my codes, does anyone can help me to check it? I am new learner of NN, not sure whether it's correct or not. The NN I am designing includes one hidden layer, one input layer (6 inputs), and one output layer (one output). I am trying to loop through 1 to 60 of the hidden neurons to find a good result, but right now, the result I get is not good at all, I am considering, maybe the code is not properly written. Thanks!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for k=1:num %clear; %clc; RandStream.setGlobalStream(RandStream('mt19937ar','seed',1)); % reset the global random number stream to its initial settings, this cause rand,randi and randn to start over, as if in a new matlab session.
% Create a Fitting Network
hiddenLayerSize = k;
net = fitnet(hiddenLayerSize); % net = fitnet([hiddenLayerSize,nn]); nn is the number of hidden layer
net=init(net); % initialize the network
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows'};
net.outputs{2}.processFcns = {'removeconstantrows'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'divideind'; % Divide data by index
net.divideParam.trainInd = trnind;
net.divideParam.testInd = tstind;
% set the transfer function (activation function) for input and
% output layers
net.layers{1}.transferFcn = 'tansig'; % layer 1 corresponds to the hidden layer
net.layers{2}.transferFcn = 'purelin'; % layer 2 corresponds to the output layer
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainbr'; % Levenberg-Marquardt optimization with Bayesian regularization
net.trainParam.goal=0.01.*var(targets); % usually set this to be 1% of the var(target)
net.trainParam.epochs=500;
net.trainParam.mu_dec=0.8;
net.trainParam.mu_inc=1.5;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'msereg'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
% Recalculate Training and Test Performance
trainTargets = targets .* tr.trainMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs);
testPerformance = perform(net,testTargets,outputs);
%some statistics codes, not shown here
end
1 Comment
Greg Heath
on 19 Feb 2016
I have a feeling you did not look any of the 154 trainbr postings in the NEWSGROUP or any of the 143 postings in ANSWERS
Is that correct?
NEWSGROUP ANSWERS
trainbr 154 143
trainbr tutorial 7 11
trainbr greg 105 112
Accepted Answer
More Answers (2)
Greg Heath
on 19 Feb 2016
Compare your own code with numerical results from
a. A modification of the simpler code in the documentation
help fitnet
doc fitnet
b. Use of the regression data at
help nnatasets
doc nndatasets
Write back with any questions.
Greg
Greg Heath
on 24 Feb 2016
0 votes
% Thank you so much, Greg. I should have posted my question earlier, it really helps a lot!
%I did not run several multiple trials, as I did not how to make the initial weights differently each time, I will try to search some answers for this.
Search greg Ntrials configure
% Secondly, for the data normalization, I did use the 'zscore', and there was some outliers, as these outliers are very important for my purpose, so I did not want to delete them, you mean just modify these outliers, but how? what I propose is to divide them by a factor to constrain them into [-1,1], is that ok?
No. Clip them between the lines
xmax = (meanx + a*stdx)*ones(1,N) xmin = (meanx - b*stdx)*ones(1,N)
for a and b of your choice.
train will automatically scale the data
% Thirdly, for the training function, now you make me feel a bit confused, for both early stopping and regularization, how to identify which is the optimal NN setting?
The default is early stopping. I use that 99.9% of the time
% 20,000 data samples(70% for training, 30% for testing), so the problem that the number of weight greater than the number of training equation did not exist in my case.
With that much data there is no good excuse for omitting a validation set.
% I will search for the trainbr designs for some codes using loops over number of hidden neurons and random weights, as I really get confused about how to do this, if you have some codes at hand, please share with me, I am not sure whether I can find the correct one... Thanks!!!
Basically the same procedure I use with trainlm. However, you don't need trainbr BUT I highly recommend using valstop.
Hope this helps.
Greg
5 Comments
Cathy Chen
on 24 Feb 2016
Cathy Chen
on 24 Feb 2016
Greg Heath
on 25 Feb 2016
Thank you very much, Greg.
%I did not quite understand about normalizing the outliers to [-1,1],
I did not say anything about [-1,1] normalization except NOT TO DO IT because it is a default.
% the equation you give: xmax = (meanx +a*stdx)*ones(1,N) % xmin = (meanx - b*stdx)*ones(1,N). Could you explain in % details?
If you look at a plot or a tabulation and there appear to be measurements with absolute values that are too large to be uncontaminated data, you have 3 options: 1. Keep them as is 2. Remove them 3. Modify them before keeping them.
For option 3 look at the plots and decide how much you are going to reduce them. The most popular method is to choose a and b based on the rest of the data. a = b = 3 is a common choice. Of course you should compare option 2 and/or 3 with 1 to determine if it makes a significant difference.
%And could I just use the mapminmax function to do the normalization? % In this way, there would be no such problem.
No. The proportional sizes of good data and outliers is the same. Therefore you still have the outlier problem.
%Will these two different normalization method cause big difference? %I know the mapminmax is the default.
I don't think you understand
1. The default is MAPMINMAX. This is used automatically. 2. An alternative is MAPSTD which is equivalent to ZSCORE. This requires overwriting MAPMINMAX and not dealing with outliers 3. My preference would be to a. use ZSCORE or MAPSTD before training so that I can deal with outliers b. Use no further normalization 4. However, to use no further normalization requires the pain of adding code to remove the default MAPMINMAX 5. Therefore I a. Use ZSCORE before training so that I can deal with outliers b. Accept the default MAPMINMAX
% I will switch back to early stopping then.
You should have begun by accepting the default early stopping (valstop) because no effort is required.
If there were problems (e.g., overtraining an overfit (Nw>Ntrneq) net), then consider using msereg instead of mse or using TRAINBR.
I hope this is clear.
% After I modify my code, I will post it here, and ask you for a correction.
No. After you modify your code you should test it on one or more of the MATLAB data examples obtained from
help nndatasets and doc nndatasets
Hope this helps.
%Thanks!!!
OK
PS I spent a lot of time formatting this post. Then either MATLAB or my computer hiccupped and I had to resubmit this unformatted version.
Cathy Chen
on 2 Mar 2016
Greg Heath
on 4 Mar 2016
Edited: Greg Heath
on 4 Mar 2016
1. Does the new data have the same summary statistics?
a. mean
b. variance
c. Significant correlation lags?
2. Are you using as few hidden nodes as possible?
3. If not, standardize both datasets (zscore or mapstd) and use a common set of lags that include significant lags of both series.
4. Use as few hidden nodes as possible.
5. Cross your fingers.
6. Let us know how it went.
Hope this helps.
Greg
Categories
Find more on Function Approximation and Nonlinear Regression in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!