Batch normalization layer

A batch normalization layer normalizes each input channel across a mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use batch normalization layers between convolutional layers and nonlinearities, such as ReLU layers.

The layer first normalizes the activations of each channel by subtracting the
mini-batch mean and dividing by the mini-batch standard deviation. Then, the layer
shifts the input by a learnable offset *β* and scales it by a learnable
scale factor *γ*.

creates a batch normalization layer.`layer`

= batchNormalizationLayer

creates a batch normalization layer and sets the optional Batch Normalization, Parameters and Initialization, Learn Rate and Regularization, and
`layer`

= batchNormalizationLayer(`'Name',Value`

)`Name`

properties using name-value pairs. For
example, `batchNormalizationLayer('Name','batchnorm')`

creates
a batch normalization layer with the name `'batchnorm'`

. You
can specify multiple name-value pairs. Enclose each property name in
quotes.

A batch normalization normalizes its inputs
*x _{i}* by first calculating the mean

$$\widehat{{x}_{i}}=\frac{{x}_{i}-{\mu}_{B}}{\sqrt{{\sigma}_{B}^{2}+\u03f5}}.$$

Here, *ϵ* (the property `Epsilon`

) improves
numerical stability when the mini-batch variance is very small. To allow for the
possibility that inputs with zero mean and unit variance are not optimal for the layer
that follows the batch normalization layer, the batch normalization layer further shifts
and scales the activations as

$${y}_{i}=\gamma {\widehat{x}}_{i}+\beta .$$

Here, the offset *β* and scale factor *γ*
(`Offset`

and `Scale`

properties) are
learnable parameters that are updated during network training.

When network training finishes, the batch normalization layer calculates the mean and
variance over the full training set and stores them in the
`TrainedMean`

and `TrainedVariance`

properties. When you use a trained network to make predictions on new images, the layer
uses the trained mean and variance instead of the mini-batch mean and variance to
normalize the activations.

[1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep
network training by reducing internal covariate shift." *preprint, arXiv:1502.03167* (2015).

`convolution2dLayer`

| `fullyConnectedLayer`

| `groupNormalizationLayer`

| `reluLayer`

| `trainingOptions`

| `trainNetwork`