I'm training a network to learn the sin function from a noisy input of 400 samples.
If I use a 1-30-1 feedforwardnet with 'trainlm' the network generalises well. If I use a 1-200-1 feedforwardnet the network overfits the training data, as expected. My understanding was that 'trainbr' on a network with too many neurons will not overfit. However if I run trainbr on a 1-200-1 network until convergence (Mu reaches maximum), the given network seems to overfit the data despite a strong reduction in "Effective # Param".
To me this is a strange behaviour. Have I misunderstood bayesian regularization? Can someone provide an explanation?
I can post my code if necessary, however first I want to know if the following is correct:
'trainbr' will not overfit with large networks if run to convergence