does anybody ever use neural network to do a prediction with regularization optimization instead of early stopping?

Hi, everyone, I am trying to train a neural network (NN) for prediction, to prevent overfitting, I chose to use regularization method for optimization, so I chose 'trainbr' as the training function, and 'msereg' as the performance function. The input and output data are preprocessed to constrain them to be within [-1,1]. And the data is divided into 2 groups randomly, one for training (70%), one for testing (30%).
Below is part of my codes, does anyone can help me to check it? I am new learner of NN, not sure whether it's correct or not. The NN I am designing includes one hidden layer, one input layer (6 inputs), and one output layer (one output). I am trying to loop through 1 to 60 of the hidden neurons to find a good result, but right now, the result I get is not good at all, I am considering, maybe the code is not properly written. Thanks!
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for k=1:num %clear; %clc; RandStream.setGlobalStream(RandStream('mt19937ar','seed',1)); % reset the global random number stream to its initial settings, this cause rand,randi and randn to start over, as if in a new matlab session.
% Create a Fitting Network
hiddenLayerSize = k;
net = fitnet(hiddenLayerSize); % net = fitnet([hiddenLayerSize,nn]); nn is the number of hidden layer
net=init(net); % initialize the network
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows'};
net.outputs{2}.processFcns = {'removeconstantrows'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'divideind'; % Divide data by index
net.divideParam.trainInd = trnind;
net.divideParam.testInd = tstind;
% set the transfer function (activation function) for input and
% output layers
net.layers{1}.transferFcn = 'tansig'; % layer 1 corresponds to the hidden layer
net.layers{2}.transferFcn = 'purelin'; % layer 2 corresponds to the output layer
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainbr'; % Levenberg-Marquardt optimization with Bayesian regularization
net.trainParam.goal=0.01.*var(targets); % usually set this to be 1% of the var(target)
net.trainParam.epochs=500;
net.trainParam.mu_dec=0.8;
net.trainParam.mu_inc=1.5;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'msereg'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
% Recalculate Training and Test Performance
trainTargets = targets .* tr.trainMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs);
testPerformance = perform(net,testTargets,outputs);
%some statistics codes, not shown here
end

1 Comment

I have a feeling you did not look any of the 154 trainbr postings in the NEWSGROUP or any of the 143 postings in ANSWERS
Is that correct?
NEWSGROUP ANSWERS
trainbr 154 143
trainbr tutorial 7 11
trainbr greg 105 112

Sign in to comment.

 Accepted Answer

GEH1=' Predictions are performed with timeseries functions'
GEH2=['You are using FITNET which is used for REGRESSION and CURVEFITTING']
GEH3 = ['You should have included results from applying
your code to the MATLAB dataset in the help and doc examples' ]
close all, clear all, clc
for i = 1:2
RandStream.setGlobalStream(RandStream('mt19937ar','seed',1));
if i == 1
[ x , t ] = simplefit ; % help & doc fitnet
else
x=[-1:.05:1];t=sin(2*pi*x)+0.1*randn(size(x));%doc trainbr
end
net = fitnet; % H = 10 default
net.trainFcn = 'trainbr';
perfratio = net.performParam.ratio % 0.95238
% minimization goal is sse(weights)+perfratio*sse(t -y)
trnratio = net.divideParam.trainratio % 0.7
valratio = net.divideParam.valratio % 0.15
tstratio = net.divideParam.testratio % 0.15
% trn/val/tst indices will be assigned during training % To obtain them, need to use training record tr:
[ net tr ] = train(net,x,t);
view(net);
y = net(x);
perf(i) = perform(net,y,t) % SAME AS MSE, NOT MINIMIZATION GOAL!
MSE(i) = mse(t-y)
end
MSE = perf % [1.6979e-11 0.0080867]
NMSE = MSE/var(t,1) %[ 3.5535e-11 0.016924 ]
Rsq = 1 - NMSE %[ 1 0.98308 ], Rsquare, See Wikipedia

6 Comments

1. The trn/val/tst ratios are obviously a BUG!
2. THe minimization goal PERF should be the linear weight/MSE combination
Hi, Greg, Thanks for your answers.
Yes, I did use this code for regression. Since I use trainbr, I divided the dataset randomly into two groups (one for NN design, one for testing), I saved the index into a mat file. Actually I did not see much difference between my code and your example code except the performance part. I just want to know is there any code that is not correct? And is this the way to do Baysian regularization? I did read some of the group questions and answers, but still not quite sure. Besides, for the data normalization, to normalize the input to [-1,1], does that mean if the data is within[-1,1], it would be ok? As in my normalization (minus mean and divided by standard deviation), there are still many data points are out of this range, in this case what should I do? Thanks!
The MATLAB training functions automatically normalize inputs and targets and denormalize outputs. So you can forget about that.
However, I recommend using zscore to standardize the data to zero-men/unit-variance to check for outliers that have to be removed or modified.
Then you have the choice of inputting original data or standardized data. I use the former.
Hope this helps.
Greg
PS. I removed this part of your code
net.divideFcn = 'divideind'; % Divide data by index
net.divideParam.trainInd = trnind;
net.divideParam.testInd = tstind;
GEH4 = 'trnind and tstind not defined'
% Thanks, Greg.
% As I mentioned, I divided my data into two parts and save their index as trnind and tstind, one for training, one for testing, so this is not the problem.
You didn't randomly divide them into trn/tst for multiple trials for each value of hidden nodes?
% for data normalization, I tried to standarize to zero mean and unit-variance,
tried ???All you have to do is
xn = zscore(x',1)'; % multidimensional
or
xn = zscore(x,1); % 1-dimensional
%surely there are outliers,
So did you delete them or modify them?
%I modified all the standardized data by dividing a certain value, so that the data are restrained to be [-1,1].
NO! Normalization to [-1 1] is a default. All you have to do is delete or modify outliers.
% I tried to use early stopping optimization (cross validation), but the result is not satisfying at all.
Why? How many random data divisions and random initial weight trials did you use?
Just using a val subset is not considered crossvalidation. The latter consists of dividing the data into a fixed number of subsets which are systematically rotated into roles of training, validation and testing.
%So I tend to use regularization (use trainbr as the training function), as shown in the code I posted. If the mu value reaches the max_mu, the optimized result is derived,
No! Just like validation stopping, it just stops the training from getting worse.
% through a different setting of hidden neurons (1 to 60), so in each loop, it is hard to tell whether the stop of iteration is deal to the reaching of max_mu. Do you have any suggestions for this? Thanks!
60 sounds over the top. what is Hub where the number of weights is greater than the number of training eqations?
tr.stop yields the reason for stopping. However, it doesn't necessarily mean the training was successful.
Maybe you should search both NEWSGROUP and ANSWERS to see if I posted any trainbr designs using loops over number of hidden nodes h = Hmin:dH:Hmax and random weights and datadivision i = 1:Ntrials.
trainbr Ntrials
or
trainbr h = Hmin:dH:Hmax
Thank you so much, Greg. I should have posted my question earlier, it really helps a lot!
Firstly, I used 'dividerand' to divide the index of my dataset into two parts (one for training, the other for testing), and then I saved the index into .mat files. so each time I run the program, the index for training and for testing are the same. And for each value of hidden neurons, I did not run several multiple trials, as I did not how to make the initial weights differently each time, I will try to search some answers for this.
Secondly, for the data normalization, I did use the 'zscore', and there was some outliers, as these outliers are very important for my purpose, so I did not want to delete them, you mean just modify these outliers, but how? what I propose is to divide them by a factor to constrain them into [-1,1], is that ok?
Thirdly, for the training function, now you make me feel a bit confused, for both early stopping and regularization, how to identify which is the optimal NN setting? I looped through 60 hidden neurons, and I have around 20,000 data samples(70% for training, 30% for testing), so the problem that the number of weight greater than the number of training equation did not exist in my case.
I will search for the trainbr designs for some codes using loops over number of hidden neurons and random weights, as I really get confused about how to do this, if you have some codes at hand, please share with me, I am not sure whether I can find the correct one... Thanks!!!

Sign in to comment.

More Answers (2)

Compare your own code with numerical results from
a. A modification of the simpler code in the documentation
help fitnet
doc fitnet
b. Use of the regression data at
help nnatasets
doc nndatasets
Write back with any questions.
Greg
% Thank you so much, Greg. I should have posted my question earlier, it really helps a lot!
%I did not run several multiple trials, as I did not how to make the initial weights differently each time, I will try to search some answers for this.
Search greg Ntrials configure
% Secondly, for the data normalization, I did use the 'zscore', and there was some outliers, as these outliers are very important for my purpose, so I did not want to delete them, you mean just modify these outliers, but how? what I propose is to divide them by a factor to constrain them into [-1,1], is that ok?
No. Clip them between the lines
xmax = (meanx + a*stdx)*ones(1,N) xmin = (meanx - b*stdx)*ones(1,N)
for a and b of your choice.
train will automatically scale the data
% Thirdly, for the training function, now you make me feel a bit confused, for both early stopping and regularization, how to identify which is the optimal NN setting?
The default is early stopping. I use that 99.9% of the time
% 20,000 data samples(70% for training, 30% for testing), so the problem that the number of weight greater than the number of training equation did not exist in my case.
With that much data there is no good excuse for omitting a validation set.
% I will search for the trainbr designs for some codes using loops over number of hidden neurons and random weights, as I really get confused about how to do this, if you have some codes at hand, please share with me, I am not sure whether I can find the correct one... Thanks!!!
Basically the same procedure I use with trainlm. However, you don't need trainbr BUT I highly recommend using valstop.
Hope this helps.
Greg

5 Comments

Thank you very much, Greg. I did not quite understand about normalizing the outliers to [-1,1], the equation you give: xmax = (meanx + a*stdx)*ones(1,N) xmin = (meanx - b*stdx)*ones(1,N). Could you explain in details? And could I just use the mapminmax function to do the normalization? In this way, there would be no such problem. Will these two different normalization method cause big difference? I know the mapminmax is the default.
I will switch back to early stopping then. After I modify my code, I will post it here, and ask you for a correction. Thanks!!!
Now my code is below, using early stoppping with trainlm. Is there any potential errors for this code? I use two loops for # of hidden neurons Hmin:dH:Hmax, and initial weights 1:Ntrials. Thanks! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clc; clear; trnind=importdata('trnind_es.mat'); % index for training valind=importdata('valind_es.mat'); % index for validation tstind=importdata('tstind_es.mat'); % index for test
% normalization by using mapmaxmin for each variable nsst=transpose(mapminmax(transpose(sst_sat),-1,1)); nrrs412=transpose(mapminmax(transpose(rrs412),-1,1)); nrrs443=transpose(mapminmax(transpose(rrs443),-1,1)); nrrs488=transpose(mapminmax(transpose(rrs488),-1,1)); nrrs555=transpose(mapminmax(transpose(rrs555),-1,1)); nrrs667=transpose(mapminmax(transpose(rrs667),-1,1)); nsss=transpose(mapminmax(transpose(sss),-1,1));
% inputs x=[nsst,nrrs412,nrrs443,nrrs488,nrrs555,nrrs667]; y=nsss; inputs = x'; targets = y';
[I N] = size(inputs); % [6 21790] [O N] = size(targets); % [1 21790] Ntrn = round(N*0.70); Nval = round(N*0.15); Ntst = round(N*0.15); Ntrneq = Ntrn*O; Hub = -1+ceil( (Ntrneq-O) / (I+O+1)) % 2042 Hmax=60; Hmin=1; dH=1; Ntrials=10;
% array for statistics MSEtrn=zeros(Hmax,Ntrials); % root mean square error MSEval=zeros(Hmax,Ntrials); MSEtst=zeros(Hmax,Ntrials); R2trn=zeros(Hmax,Ntrials); R2val=zeros(Hmax,Ntrials); R2tst=zeros(Hmax,Ntrials); MB=zeros(Hmax,Ntrials); %mean bias MR=zeros(Hmax,Ntrials); % mean ratio
rng(0); j=0; for H=Hmin:dH:Hmax j=j+1; % Create a Fitting Network net=fitnet(H); %net=init(net); % initialize the network
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows'};
net.outputs{2}.processFcns = {'removeconstantrows'};
% set the transfer function (activation function) for input and
% output layers
net.layers{1}.transferFcn = 'tansig'; % layer 1 corresponds to the hidden layer
net.layers{2}.transferFcn = 'purelin'; % layer 2 corresponds to the output layer
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'divideind'; % Divide data by index
net.divideParam.trainInd = trnind;
net.divideParam.valInd = valind;
net.divideParam.testInd = tstind;
% settings of training function
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
net.trainParam.goal=0.01.*var(targets); % usually set this to be 1% of the var(target)
net.trainParam.epochs=1000;
net.trainParam.mu_dec=0.8;
net.trainParam.mu_inc=1.5;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
for i=1:Ntrials
net=configure(net,inputs,targets);
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs);
valPerformance = perform(net,valTargets,outputs);
testPerformance = perform(net,testTargets,outputs);
% denormalize the output, sss is my target before
% normalization
ymax=max(sss);
ymin=min(sss);
xmin=-1;
xmax=1;
outputs_real=(ymax-ymin)*(outputs-xmin)/(xmax-xmin) + ymin;
errors_real=gsubtract(outputs_real,sss'); % predict-real
% calculate the R2 and RMSE
vartrn = mean(var(sss(trnind),1));
%meantrna = mean(var(targets(trnind),0));
varval = mean(var(sss(valind),1));
vartst = mean(var(sss(tstind),1));
Nw = (I+1)*H+(H+1)*O; % maximum is 481 when H=60 (Hmax)
Ndof = Ntrneq-Nw;
MSEtrn(H,i) = sqrt(mse(errors_real(trnind)));
%MSEtrna = Ntrneq*MSEtrn/Ndof
MSEval(H,i) = sqrt(mse(errors_real(valind)));
MSEtst(H,i) = sqrt(mse(errors_real(tstind)));
R2trn(H,i) = 1-MSEtrn(H,i).^2/vartrn;
%R2trna = 1-MSEtrna/meantrna
R2val(H,i) = 1-MSEval(H,i).^2/varval;
R2tst(H,i) = 1-MSEtst(H,i).^2/vartst;
MB(H,i) = mean(outputs_real'-sss);
MR(H,i) = mean(outputs_real'./sss);
end
end
Thank you very much, Greg.
%I did not quite understand about normalizing the outliers to [-1,1],
I did not say anything about [-1,1] normalization except NOT TO DO IT because it is a default.
% the equation you give: xmax = (meanx +a*stdx)*ones(1,N) % xmin = (meanx - b*stdx)*ones(1,N). Could you explain in % details?
If you look at a plot or a tabulation and there appear to be measurements with absolute values that are too large to be uncontaminated data, you have 3 options: 1. Keep them as is 2. Remove them 3. Modify them before keeping them.
For option 3 look at the plots and decide how much you are going to reduce them. The most popular method is to choose a and b based on the rest of the data. a = b = 3 is a common choice. Of course you should compare option 2 and/or 3 with 1 to determine if it makes a significant difference.
%And could I just use the mapminmax function to do the normalization? % In this way, there would be no such problem.
No. The proportional sizes of good data and outliers is the same. Therefore you still have the outlier problem.
%Will these two different normalization method cause big difference? %I know the mapminmax is the default.
I don't think you understand
1. The default is MAPMINMAX. This is used automatically. 2. An alternative is MAPSTD which is equivalent to ZSCORE. This requires overwriting MAPMINMAX and not dealing with outliers 3. My preference would be to a. use ZSCORE or MAPSTD before training so that I can deal with outliers b. Use no further normalization 4. However, to use no further normalization requires the pain of adding code to remove the default MAPMINMAX 5. Therefore I a. Use ZSCORE before training so that I can deal with outliers b. Accept the default MAPMINMAX
% I will switch back to early stopping then.
You should have begun by accepting the default early stopping (valstop) because no effort is required.
If there were problems (e.g., overtraining an overfit (Nw>Ntrneq) net), then consider using msereg instead of mse or using TRAINBR.
I hope this is clear.
% After I modify my code, I will post it here, and ask you for a correction.
No. After you modify your code you should test it on one or more of the MATLAB data examples obtained from
help nndatasets and doc nndatasets
Hope this helps.
%Thanks!!!
OK
PS I spent a lot of time formatting this post. Then either MATLAB or my computer hiccupped and I had to resubmit this unformatted version.
Thank you very much for your help, Greg. I really appreciate your time and efforts. Recently, I did a lot of experiments of my program, the results of the training and testing in terms of R2 and RMSE are acceptable, while when I apply it to a totally new dataset, there can be big difference between the NN predicted values and the true values, which means the NN is not generalized well. Do you have any suggestions for this regard? Thanks!
1. Does the new data have the same summary statistics?
a. mean
b. variance
c. Significant correlation lags?
2. Are you using as few hidden nodes as possible?
3. If not, standardize both datasets (zscore or mapstd) and use a common set of lags that include significant lags of both series.
4. Use as few hidden nodes as possible.
5. Cross your fingers.
6. Let us know how it went.
Hope this helps.
Greg

Sign in to comment.

Asked:

on 18 Feb 2016

Edited:

on 4 Mar 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!