Struggling to Improve Neural Network Performance
4 views (last 30 days)
Show older comments
George Tsitsopoulos
on 5 Jul 2016
Answered: Greg Heath
on 14 Jul 2016
I am working on creating a function fitting neural network with the neural network toolbox but I haven't had much success getting it to work correctly. I have an input matrix with two features. I currently use fitnet (I've tried cascadeforwardnet/feedforwardnet without much difference) and have two hidden layers, each with 10 neurons. I've been using `trainbr` because it has given me better results than `trainlm`. I'm trying to normalize or standardize the data but haven't had much success. I know that fitnet uses mapminmax by default and I've seen Greg Heath's suggestion that I use zscore to standardize first. The problem is, every time I've used the zscore standardization I haven't gotten very good neural network results. My output needs to be completely positive after de-standardization yet I still get negative values. Because of this, I have used log10 to normalize the data, therefore keeping all of the values positive.
In order to see prediction error, I have found the maximum percent error at any individual output point. I cannot get error lower than 40%, and there are multiple other points with decently high error.
Is there anything else that I can do, whether it be normalization/standardization or network reconfiguration to improve my network performance?
EDIT:
I'm not sure if this is of any help but the regression plot shows that the R = 0.99984 so it seems very accurate.
Thank you for the help,
George
0 Comments
Accepted Answer
Greg Heath
on 14 Jul 2016
George Tsitsopoulos about 5 hours ago Hi Greg,
>Why is it that MSE is the best measure of error and >the way I was calculating error is no good? Is it >because the toolbox focuses on minimizing that error >so me trying to minimize a different type of error >is of little help?
That's part of it. The other part is does percent error really make sense in a regression problem?
>I made the changes you suggested and run 10 trials >where each trial has some multiple of 3 hidden neurons >where the multiple is between 3 and 30. Each of these >trials builds & simulates 10 ANNs, each with a random >split of training/testing/validation data. I made it >so that the training data must be between 60% and 93% >of the total input data. I calculate the r squared and >output it to the below 10x10 matrix. Now that I have >this, I see that many of the values in the matrices >are above 0.99. How am I supposed to differentiate >between these values?
Ideally, N is sufficiently large so that the tst results are accurate and UNBIASED while the val results are relatively accurate and only SLIGHTLY BIASED.
Typically, the training goal I use is to minimize H subject to the constraint Rtrnsq > 0.99. I first obtain four 10x10 Rsq matrices for trn, val, tst and all. Next, these are reduced to four 4x10 matrices containing the min, median, mean and maximum Rsq vs H. Finally, the four rows of the four matrices are plotted.
As far as choosing one design, I would favor a net with Rsq > 0.99 at the smallest value of H.
>> Also, is there a way to bound my output so it is always positive? A negative output is impossible in the real world yet the neural net has several points that are output as negative.
>Using a bounded output transfer function will keep the output within bounds. Either TANSIG or LOGSIG will work. The scaling to your data will be done automatically.
>>Whether I use logsig or tansig as the hidden layer transfer function doesn't make a difference in output. I always end up having some values in the NN output be negative, which is impossible for what I'm trying to do. The only thing that's ever guaranteed my output be positive was using log10(target) before training/ simulating and then 10^target afterwards.
You have misinterpreted what I said. Change the OUTPUT TRANSFER FUNCTION!
Hope this helps.
Greg
0 Comments
More Answers (1)
Greg Heath
on 7 Jul 2016
> I have an input matrix with two features.
[ I N ] = size(input) % [ 2 N ], N = ?
[ O N ] = size(target)% [ O N ], O = ?
> I currently use fitnet ... and have two hidden layers, each with 10 neurons.
A single hidden layer is sufficient.
With no regularization, number of "H"idden neurons, H, is
limited by the number of equations Neq = N*O
> I've been using trainbr`because it has given me better results than trainlm.
TRAINBR uses regularization which mitigates using a large H.
> I'm trying to normalize or standardize the data but haven't had much success. I know that fitnet uses mapminmax by default and I've seen Greg Heath's suggestion that I use zscore to standardize first.
I use zscore in order to detect outliers which may have to be modified or deleted.
> The problem is, every time I've used the zscore standardization I haven't gotten very good neural network results.
Perhaps you misused it. I cannot tell you how without details.
> My output needs to be completely positive after de-standardization yet I still get negative values. Because of this, I have used log10 to normalize the data, therefore keeping all of the values positive.
I don't think log10 is sufficient for positivity. The best way to impose output bounds is to use a bounded output transfer function like logsig or tansig
> In order to see prediction error, I have found the maximum percent error at any individual output point. I cannot get error lower than 40%, and there are multiple other points with decently high error.
If you are using fitnet, the default performance measure is MSE = mse(error). The corresponding scale free measures that are trivial to understand are
NMSE = MSE/mean(var(target',1)
and
Rsquare = 1 - NMSE
> Is there anything else that I can do, whether it be normalization/standardization or network reconfiguration to improve my network performance?
Using NMSE and Rsq are more reliable for measuring regression performance. I see no good reason for the log transformation
% EDIT: % % I'm not sure if this is of any help but the regression plot % shows that the R = 0.99984 so it seems very accurate.
Given your log transformation, I'm not sure just what that means. However, my guess is that it is good. Make sure by plotting unnormalized target and output on the same graph.
Hope this helps.
Thank you for formally accepting my answer
Greg
5 Comments
Greg Heath
on 12 Jul 2016
> Unfortunately, there is still error around 2000% at certain points.
For regression, the only type of measure that makes sense is one that is linear w.r.t. MSE, the measure that you are trying to minimize directly. The most commonly used are the normalized mse
NMSE = mse(error)/mean(var(target',1))
and the corresponding R squared (see Wikipedia)
Rsq = 1 - NMSE
The denominator of NMSE is the smallest mse that could occur from the naive model output = constant. The minimizing solution is output = mean(target')'.
> You said that the number of hidden neurons is limited by the number of hidden equations. In your equation above, N=1767 and O=1 so I can have a maximum of 1767 neurons, correct?
No.
Using the default data division ratios of /0.7/0.15/0.15 yields the number of training examples
Ntrn = N - 2*round(0.15*N) %1237
and corresponding number of TRAINING equations
Ntrneq = Ntrn*O % 1237
To avoid the phenomenon of overfitting when using the default training function TRAINLM, keep the number of unknown weights
Nw = (I +1)*H+(H+1)*O
no greater than Ntrneq. However, for the purpose of numerical stability w.r.t. noise and measurement error, it should be considerably less.
Accordingly, Nw << Ntrneq yields the upper bound and reasonable maximum
H <= Hmax << Hub = (Ntrneq-O)/(I+O+1) = 309
Therefore I would probably first consider the 10 values
h = Hmin:dH:Hmax = 3:3:30
with
Ntrials = 1:10
random sets of initial weights and datadivisions each. Then display the 10x10 array of Rsq as I have done in many posted examples.
> You also said that bayesian regularization limits the number of hidden neurons further.
That is not what I meant. If you have to increase Hmax too much e.g., Hmax >~ Hub/2 ~ 155 then you should consider trainbr whos output is much more insensitive to overfitting.
> Could it be that 15 hidden neurons is too few?
That will be revealed as a product of the double loop search over h and Ntrials.
> Also, is there a way to bound my output so it is always positive? A negative output is impossible in the real world yet the neural net has several points that are output as negative.
Using a bounded output transfer function will keep the output within bounds. Either TANSIG or LOGSIG will work. The scaling to your data will be done automatically.
Hope this helps.
Greg
See Also
Categories
Find more on Define Shallow Neural Network Architectures in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!