does anybody ever use neural network to do a prediction with regularization optimization instead of early stopping?

Question

0 votes

Hi, everyone, I am trying to train a neural network (NN) for prediction, to prevent overfitting, I chose to use regularization method for optimization, so I chose 'trainbr' as the training function, and 'msereg' as the performance function. The input and output data are preprocessed to constrain them to be within [-1,1]. And the data is divided into 2 groups randomly, one for training (70%), one for testing (30%).

Below is part of my codes, does anyone can help me to check it? I am new learner of NN, not sure whether it's correct or not. The NN I am designing includes one hidden layer, one input layer (6 inputs), and one output layer (one output). I am trying to loop through 1 to 60 of the hidden neurons to find a good result, but right now, the result I get is not good at all, I am considering, maybe the code is not properly written. Thanks!

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for k=1:num %clear; %clc; RandStream.setGlobalStream(RandStream('mt19937ar','seed',1)); % reset the global random number stream to its initial settings, this cause rand,randi and randn to start over, as if in a new matlab session.

            % Create a Fitting Network
            hiddenLayerSize = k;
            net = fitnet(hiddenLayerSize); % net = fitnet([hiddenLayerSize,nn]); nn is the number of hidden layer
            net=init(net); % initialize the network
            % Choose Input and Output Pre/Post-Processing Functions
            % For a list of all processing functions type: help nnprocess
            net.inputs{1}.processFcns = {'removeconstantrows'};
            net.outputs{2}.processFcns = {'removeconstantrows'};
            % Setup Division of Data for Training, Validation, Testing
            % For a list of all data division functions type: help nndivide
            net.divideFcn = 'divideind';  % Divide data by index
            net.divideParam.trainInd = trnind;
            net.divideParam.testInd = tstind;
            % set the transfer function (activation function) for input and
            % output layers
            net.layers{1}.transferFcn = 'tansig'; % layer 1 corresponds to the hidden layer
            net.layers{2}.transferFcn = 'purelin'; % layer 2 corresponds to the output layer
            % For help on training function 'trainlm' type: help trainlm
            % For a list of all training functions type: help nntrain
            net.trainFcn = 'trainbr';  % Levenberg-Marquardt optimization with Bayesian regularization
            net.trainParam.goal=0.01.*var(targets); % usually set this to be 1% of the var(target)
            net.trainParam.epochs=500;
            net.trainParam.mu_dec=0.8;
            net.trainParam.mu_inc=1.5;
            % Choose a Performance Function
            % For a list of all performance functions type: help nnperformance
            net.performFcn = 'msereg';  % Mean squared error
            % Choose Plot Functions
            % For a list of all plot functions type: help nnplot
            net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
              'plotregression', 'plotfit'};
            % Train the Network
            [net,tr] = train(net,inputs,targets);
            % Test the Network
            outputs = net(inputs);
            errors = gsubtract(targets,outputs);
            performance = perform(net,targets,outputs);
            % Recalculate Training and Test Performance
            trainTargets = targets .* tr.trainMask{1};
            testTargets = targets  .* tr.testMask{1};
            trainPerformance = perform(net,trainTargets,outputs);
            testPerformance = perform(net,testTargets,outputs);
            %some statistics codes, not shown here
end

1 Comment
Show -1 older comments Hide -1 older comments

Greg Heath on 19 Feb 2016

Open in MATLAB Online

I have a feeling you did not look any of the 154 trainbr postings in the NEWSGROUP or any of the 143 postings in ANSWERS

Is that correct?

                    NEWSGROUP      ANSWERS
 trainbr                154             143
 trainbr tutorial         7              11
 trainbr greg           105             112

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Greg Heath on 19 Feb 2016

Open in MATLAB Online

0 votes

 GEH1=' Predictions are performed with timeseries functions'
 GEH2=['You are using FITNET which is used for REGRESSION and CURVEFITTING']
 GEH3 = ['You should have included results from applying 
your code to the MATLAB dataset in the help and doc examples' ]
 close all, clear all, clc
 for i = 1:2
   RandStream.setGlobalStream(RandStream('mt19937ar','seed',1));
   if i == 1
     [ x , t ] = simplefit ;            % help & doc fitnet
   else
     x=[-1:.05:1];t=sin(2*pi*x)+0.1*randn(size(x));%doc trainbr
   end
 net          = fitnet;               % H = 10 default
 net.trainFcn = 'trainbr';
 perfratio    = net.performParam.ratio   % 0.95238 
 % minimization goal is sse(weights)+perfratio*sse(t -y)
 trnratio = net.divideParam.trainratio % 0.7
 valratio = net.divideParam.valratio   % 0.15
 tstratio = net.divideParam.testratio  % 0.15

% trn/val/tst indices will be assigned during training % To obtain them, need to use training record tr:

[ net tr ] = train(net,x,t);
 view(net);
 y = net(x);
 perf(i) = perform(net,y,t) % SAME AS MSE, NOT MINIMIZATION GOAL!
 MSE(i) = mse(t-y)
end
 MSE  = perf         % [1.6979e-11   0.0080867]  
 NMSE = MSE/var(t,1) %[ 3.5535e-11   0.016924 ]
 Rsq  = 1 - NMSE   %[        1     0.98308 ], Rsquare, See Wikipedia

6 Comments
Show 4 older comments Hide 4 older comments

Cathy Chen on 22 Feb 2016

Edited: Greg Heath on 22 Feb 2016

Open in MATLAB Online

% Thanks, Greg.

% As I mentioned, I divided my data into two parts and save their index as trnind and tstind, one for training, one for testing, so this is not the problem.

You didn't randomly divide them into trn/tst for multiple trials for each value of hidden nodes?

% for data normalization, I tried to standarize to zero mean and unit-variance,

tried ???All you have to do is

 xn = zscore(x',1)'; % multidimensional 
 or
 xn = zscore(x,1);   % 1-dimensional

%surely there are outliers,

So did you delete them or modify them?

%I modified all the standardized data by dividing a certain value, so that the data are restrained to be [-1,1].

NO! Normalization to [-1 1] is a default. All you have to do is delete or modify outliers.

% I tried to use early stopping optimization (cross validation), but the result is not satisfying at all.

Why? How many random data divisions and random initial weight trials did you use?

Just using a val subset is not considered crossvalidation. The latter consists of dividing the data into a fixed number of subsets which are systematically rotated into roles of training, validation and testing.

%So I tend to use regularization (use trainbr as the training function), as shown in the code I posted. If the mu value reaches the max_mu, the optimized result is derived,

No! Just like validation stopping, it just stops the training from getting worse.

% through a different setting of hidden neurons (1 to 60), so in each loop, it is hard to tell whether the stop of iteration is deal to the reaching of max_mu. Do you have any suggestions for this? Thanks!

60 sounds over the top. what is Hub where the number of weights is greater than the number of training eqations?

tr.stop yields the reason for stopping. However, it doesn't necessarily mean the training was successful.

Maybe you should search both NEWSGROUP and ANSWERS to see if I posted any trainbr designs using loops over number of hidden nodes h = Hmin:dH:Hmax and random weights and datadivision i = 1:Ntrials.

 trainbr Ntrials
 or
 trainbr h = Hmin:dH:Hmax

Cathy Chen on 23 Feb 2016

Thank you so much, Greg. I should have posted my question earlier, it really helps a lot!

Firstly, I used 'dividerand' to divide the index of my dataset into two parts (one for training, the other for testing), and then I saved the index into .mat files. so each time I run the program, the index for training and for testing are the same. And for each value of hidden neurons, I did not run several multiple trials, as I did not how to make the initial weights differently each time, I will try to search some answers for this.

Secondly, for the data normalization, I did use the 'zscore', and there was some outliers, as these outliers are very important for my purpose, so I did not want to delete them, you mean just modify these outliers, but how? what I propose is to divide them by a factor to constrain them into [-1,1], is that ok?

Thirdly, for the training function, now you make me feel a bit confused, for both early stopping and regularization, how to identify which is the optimal NN setting? I looped through 60 hidden neurons, and I have around 20,000 data samples(70% for training, 30% for testing), so the problem that the number of weight greater than the number of training equation did not exist in my case.

I will search for the trainbr designs for some codes using loops over number of hidden neurons and random weights, as I really get confused about how to do this, if you have some codes at hand, please share with me, I am not sure whether I can find the correct one... Thanks!!!

Sign in to comment.

Answer 2

Greg Heath on 19 Feb 2016

Open in MATLAB Online

0 votes

Compare your own code with numerical results from

 a. A modification of the simpler code in the documentation
    help fitnet
    doc fitnet
 b. Use of the regression data at
 help nnatasets
 doc nndatasets

Write back with any questions.

Greg

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

Greg Heath on 24 Feb 2016

0 votes

% Thank you so much, Greg. I should have posted my question earlier, it really helps a lot!

%I did not run several multiple trials, as I did not how to make the initial weights differently each time, I will try to search some answers for this.

Search greg Ntrials configure

% Secondly, for the data normalization, I did use the 'zscore', and there was some outliers, as these outliers are very important for my purpose, so I did not want to delete them, you mean just modify these outliers, but how? what I propose is to divide them by a factor to constrain them into [-1,1], is that ok?

No. Clip them between the lines

xmax = (meanx + a*stdx)*ones(1,N) xmin = (meanx - b*stdx)*ones(1,N)

for a and b of your choice.

train will automatically scale the data

% Thirdly, for the training function, now you make me feel a bit confused, for both early stopping and regularization, how to identify which is the optimal NN setting?

The default is early stopping. I use that 99.9% of the time

% 20,000 data samples(70% for training, 30% for testing), so the problem that the number of weight greater than the number of training equation did not exist in my case.

With that much data there is no good excuse for omitting a validation set.

% I will search for the trainbr designs for some codes using loops over number of hidden neurons and random weights, as I really get confused about how to do this, if you have some codes at hand, please share with me, I am not sure whether I can find the correct one... Thanks!!!

Basically the same procedure I use with trainlm. However, you don't need trainbr BUT I highly recommend using valstop.

Hope this helps.

Greg

5 Comments
Show 3 older comments Hide 3 older comments

Cathy Chen on 24 Feb 2016

Open in MATLAB Online

Now my code is below, using early stoppping with trainlm. Is there any potential errors for this code? I use two loops for # of hidden neurons Hmin:dH:Hmax, and initial weights 1:Ntrials. Thanks! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clc; clear; trnind=importdata('trnind_es.mat'); % index for training valind=importdata('valind_es.mat'); % index for validation tstind=importdata('tstind_es.mat'); % index for test

% normalization by using mapmaxmin for each variable nsst=transpose(mapminmax(transpose(sst_sat),-1,1)); nrrs412=transpose(mapminmax(transpose(rrs412),-1,1)); nrrs443=transpose(mapminmax(transpose(rrs443),-1,1)); nrrs488=transpose(mapminmax(transpose(rrs488),-1,1)); nrrs555=transpose(mapminmax(transpose(rrs555),-1,1)); nrrs667=transpose(mapminmax(transpose(rrs667),-1,1)); nsss=transpose(mapminmax(transpose(sss),-1,1));

% inputs x=[nsst,nrrs412,nrrs443,nrrs488,nrrs555,nrrs667]; y=nsss; inputs = x'; targets = y';

[I N] = size(inputs); % [6 21790] [O N] = size(targets); % [1 21790] Ntrn = round(N*0.70); Nval = round(N*0.15); Ntst = round(N*0.15); Ntrneq = Ntrn*O; Hub = -1+ceil( (Ntrneq-O) / (I+O+1)) % 2042 Hmax=60; Hmin=1; dH=1; Ntrials=10;

% array for statistics MSEtrn=zeros(Hmax,Ntrials); % root mean square error MSEval=zeros(Hmax,Ntrials); MSEtst=zeros(Hmax,Ntrials); R2trn=zeros(Hmax,Ntrials); R2val=zeros(Hmax,Ntrials); R2tst=zeros(Hmax,Ntrials); MB=zeros(Hmax,Ntrials); %mean bias MR=zeros(Hmax,Ntrials); % mean ratio

rng(0); j=0; for H=Hmin:dH:Hmax j=j+1; % Create a Fitting Network net=fitnet(H); %net=init(net); % initialize the network

            % Choose Input and Output Pre/Post-Processing Functions
            % For a list of all processing functions type: help nnprocess
            net.inputs{1}.processFcns = {'removeconstantrows'};
            net.outputs{2}.processFcns = {'removeconstantrows'};
            % set the transfer function (activation function) for input and
            % output layers
            net.layers{1}.transferFcn = 'tansig'; % layer 1 corresponds to the hidden layer
            net.layers{2}.transferFcn = 'purelin'; % layer 2 corresponds to the output layer
            % Setup Division of Data for Training, Validation, Testing
            % For a list of all data division functions type: help nndivide
            net.divideFcn = 'divideind';  % Divide data by index
            net.divideParam.trainInd = trnind;
            net.divideParam.valInd = valind;
            net.divideParam.testInd = tstind; 
            % settings of training function
            net.trainFcn = 'trainlm';  % Levenberg-Marquardt
            net.trainParam.goal=0.01.*var(targets); % usually set this to be 1% of the var(target)
            net.trainParam.epochs=1000;
            net.trainParam.mu_dec=0.8;
            net.trainParam.mu_inc=1.5;
            % Choose a Performance Function
            % For a list of all performance functions type: help nnperformance
            net.performFcn = 'mse';  % Mean squared error
            % Choose Plot Functions
            % For a list of all plot functions type: help nnplot
            net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
              'plotregression', 'plotfit'};
            for i=1:Ntrials
                net=configure(net,inputs,targets);
                [net,tr] = train(net,inputs,targets);
                % Test the Network
                outputs = net(inputs);
                errors = gsubtract(targets,outputs);
                performance = perform(net,targets,outputs);
                % Recalculate Training, Validation and Test Performance
                trainTargets = targets .* tr.trainMask{1};
                valTargets = targets  .* tr.valMask{1};
                testTargets = targets  .* tr.testMask{1};
                trainPerformance = perform(net,trainTargets,outputs);
                valPerformance = perform(net,valTargets,outputs);
                testPerformance = perform(net,testTargets,outputs);
                % denormalize the output, sss is my target before
                % normalization
                ymax=max(sss);
                ymin=min(sss);
                xmin=-1;
                xmax=1;
                outputs_real=(ymax-ymin)*(outputs-xmin)/(xmax-xmin) + ymin;
                errors_real=gsubtract(outputs_real,sss'); % predict-real
                % calculate the R2 and RMSE
                vartrn = mean(var(sss(trnind),1));
                %meantrna = mean(var(targets(trnind),0)); 
                varval = mean(var(sss(valind),1));
                vartst = mean(var(sss(tstind),1));
                Nw = (I+1)*H+(H+1)*O; % maximum is 481 when H=60 (Hmax)                 
                Ndof = Ntrneq-Nw;
                MSEtrn(H,i) = sqrt(mse(errors_real(trnind)));
                %MSEtrna = Ntrneq*MSEtrn/Ndof
                MSEval(H,i) = sqrt(mse(errors_real(valind)));
                MSEtst(H,i) = sqrt(mse(errors_real(tstind)));
                R2trn(H,i) = 1-MSEtrn(H,i).^2/vartrn;
                %R2trna = 1-MSEtrna/meantrna
                R2val(H,i) = 1-MSEval(H,i).^2/varval;
                R2tst(H,i) = 1-MSEtst(H,i).^2/vartst; 
                MB(H,i) = mean(outputs_real'-sss);
                MR(H,i) = mean(outputs_real'./sss);
            end
end

Greg Heath on 25 Feb 2016

Thank you very much, Greg.

%I did not quite understand about normalizing the outliers to [-1,1],

I did not say anything about [-1,1] normalization except NOT TO DO IT because it is a default.

% the equation you give: xmax = (meanx +a*stdx)*ones(1,N) % xmin = (meanx - b*stdx)*ones(1,N). Could you explain in % details?

If you look at a plot or a tabulation and there appear to be measurements with absolute values that are too large to be uncontaminated data, you have 3 options: 1. Keep them as is 2. Remove them 3. Modify them before keeping them.

For option 3 look at the plots and decide how much you are going to reduce them. The most popular method is to choose a and b based on the rest of the data. a = b = 3 is a common choice. Of course you should compare option 2 and/or 3 with 1 to determine if it makes a significant difference.

%And could I just use the mapminmax function to do the normalization? % In this way, there would be no such problem.

No. The proportional sizes of good data and outliers is the same. Therefore you still have the outlier problem.

%Will these two different normalization method cause big difference? %I know the mapminmax is the default.

I don't think you understand

1. The default is MAPMINMAX. This is used automatically. 2. An alternative is MAPSTD which is equivalent to ZSCORE. This requires overwriting MAPMINMAX and not dealing with outliers 3. My preference would be to a. use ZSCORE or MAPSTD before training so that I can deal with outliers b. Use no further normalization 4. However, to use no further normalization requires the pain of adding code to remove the default MAPMINMAX 5. Therefore I a. Use ZSCORE before training so that I can deal with outliers b. Accept the default MAPMINMAX

% I will switch back to early stopping then.

You should have begun by accepting the default early stopping (valstop) because no effort is required.

If there were problems (e.g., overtraining an overfit (Nw>Ntrneq) net), then consider using msereg instead of mse or using TRAINBR.

I hope this is clear.

% After I modify my code, I will post it here, and ask you for a correction.

No. After you modify your code you should test it on one or more of the MATLAB data examples obtained from

help nndatasets and doc nndatasets

Hope this helps.

%Thanks!!!

OK

PS I spent a lot of time formatting this post. Then either MATLAB or my computer hiccupped and I had to resubmit this unformatted version.

Cathy Chen on 2 Mar 2016

Thank you very much for your help, Greg. I really appreciate your time and efforts. Recently, I did a lot of experiments of my program, the results of the training and testing in terms of R2 and RMSE are acceptable, while when I apply it to a totally new dataset, there can be big difference between the NN predicted values and the true values, which means the NN is not generalized well. Do you have any suggestions for this regard? Thanks!

Greg Heath on 4 Mar 2016

Edited: Greg Heath on 4 Mar 2016

Open in MATLAB Online

1. Does the new data have the same summary statistics?

 a. mean
 b. variance
 c. Significant correlation lags?

2. Are you using as few hidden nodes as possible?

3. If not, standardize both datasets (zscore or mapstd) and use a common set of lags that include significant lags of both series.

4. Use as few hidden nodes as possible.

5. Cross your fingers.

6. Let us know how it went.

Hope this helps.

Greg

Sign in to comment.

does anybody ever use neural network to do a prediction with regularization optimization instead of early stopping?

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (2)

0 Comments
Show -2 older comments Hide -2 older comments

5 Comments
Show 3 older comments Hide 3 older comments

Categories

Products

Tags

Community Treasure Hunt

does anybody ever use neural network to do a prediction with regularization optimization instead of early stopping?

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (2)

0 Comments Show -2 older comments Hide -2 older comments

5 Comments Show 3 older comments Hide 3 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

6 Comments
Show 4 older comments Hide 4 older comments

0 Comments
Show -2 older comments Hide -2 older comments

5 Comments
Show 3 older comments Hide 3 older comments