Set Up Parameters and Train Convolutional Neural Network

To specify the training options for the trainnet function, use the trainingOptions function. Pass the resulting options object to the trainnet function.

For example, to create a training options object that specifies:

Train using the adaptive moment estimation (Adam) solver.
Train for at most four epochs.
Monitor the training progress in a plot and monitor the accuracy metric.
Disable the verbose output.

use:

options = trainingOptions("adam", ...
    MaxEpochs=4, ...
    Plots="training-progress", ...
    Metrics="accuracy", ...
    Verbose=false);

To train the network using these training options, use:

net = trainnet(data,layers,lossFcn,options);

Note

This topic outlines some commonly used training options. The options listed here are only a subset. For a complete list, see trainingOptions.

Solvers

The solver is the algorithm that the training function uses to optimize the learnable parameters. Specify the solver using the first argument of the trainingOptions function. For example, to create a training options object with the default settings for the Adam optimizer, use:

options = trainingOptions("adam");

For more information, see trainingOptions.

Tip

Different solvers work better for different tasks. The Adam solver is often a good optimizer to try first.

Monitoring Options

To monitor the training progress, you can display training metrics in a plot. For example, to monitor the accuracy in a plot and disable the verbose output, use:

options = trainingOptions("adam", ...
    Plots="training-progress", ...
    Metrics="accuracy", ...
    Verbose=false);

For more information, see trainingOptions.

Data Format Options

Most deep learning networks and functions operate on different dimensions of the input data in different ways.

For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.

In most cases, you can pass you training data directly to the network. If you have data in a different layout to what the network expects, then you can specify the layout of the data using data formats.

A data format is a string of characters, where each character describes the type of the corresponding data dimension.

The characters are:

"S" — Spatial
"C" — Channel
"B" — Batch
"T" — Time
"U" — Unspecified

For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT" (channel, batch, time).

If you have data that has a different layout to what the network expects, then is usually easier to provide data format information than reshaping and preprocessing your data. For example, to specify that you have sequence data, where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively, specify that the input data has format "CBT" (channel, batch, time) using:

options = trainingOptions("adam", ...
    InputDataFormats="CBT");

For more information, see trainingOptions.

Stochastic Solver Options

Stochastic solvers train neural networks by iterating over mini-batches of data and updating the neural network learnable parameters. You can specify stochastic solver options that control the mini-batches, epochs (full passes of the training data), learning rate, and other solver-specific settings such as momentum for the stochastic gradient descent with momentum (SGDM) solver. For example, to specify a mini-batch size of 16 with an initial learning rate of 0.01, use:

options = trainingOptions("adam", ...
    MiniBatchSiize=16, ...
    InitialLearnRate=0.01);

For more information, see trainingOptions.

Tip

If the mini-batch loss during training ever becomes NaN, then the learning rate is likely too high. Try reducing the learning rate, for example by a factor of 3, and then restart network training.

L-BFGS Solver Options

The limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) solver is a full-batch solver, which means that it processes the entire training set in a single iteration. You can specify L-BFGS solver options that control the iterations (full passes of training the data), line search, and other solver-specific settings. For example, to specify to train for 2000 iterations using L-BFGS and find a learning rate that satisfies the strong Wolfe conditions, use:

options = trainingOptions("lbfgs", ...
    MaxIterations=2000, ...
    LineSearchMethod="strong-wolfe");

For more information, see trainingOptions.

Validation Options

You can monitor training progress using a held-out validation data set. Performing validation at regular intervals during training helps you to determine if your network is overfitting to the training data. To check if your network is overfitting, compare the training metrics to the corresponding validation metrics. If the training metrics are significantly better than the validation metrics, then the network could be overfitting. For example, to specify a validation data set and validate the network every 100 iterations, use:

options = trainingOptions("adam", ...
    ValidationData={XValidation,TValidation}, ...
    ValidationFrequency=100);

If your network has layers that behave differently during prediction than during training (for example, dropout layers), then the validation metrics can be better than the training metrics.

For more information, see trainingOptions.

Regularization and Normalization Options

You can prevent overfitting and improve convergence using regularization and normalization. Regularization can help prevent overfitting by adding a penalty term to the loss function. Normalization can improve convergence and stability by scaling input data to a standard range. For example, to specify an L₂ regularization factor of 0.0002, use:

options = trainingOptions("adam", ...
    L2Regularization=0.0002);

For more information, see trainingOptions.

Gradient Clipping Options

To prevent large gradient from introducing errors in the training process, you can limit the magnitude of the gradients. For example, to scale the gradients to have magnitude equal to 2, use

options = trainingOptions("adam", ...
    GradientThresholdMethod="absolute-value", ...
    GradientThreshold=2);

For more information, see trainingOptions.

Sequence Options

Training a neural network usually requires data with fixed sizes, for example, sequences with the same number of channels and time steps. To transform batches of sequences so that the sequences have the same length, you can specify padding and truncation options. For example, to left-pad mini-batches so that the sequences in each mini-batch have the same length, use:

options = trainingOptions("adam", ...
    SequencePaddingDirection="left");

For more information, see trainingOptions.

Hardware and Acceleration Options

The software, by default, trains using a supported GPU if one is available. Using a GPU requires a Parallel Computing Toolbox™ license. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). You can specify additional hardware and acceleration options. For example, to specify to use multiple GPUs on one machine, using a local parallel pool based on your default cluster profile, use:

options = trainingOptions("adam", ...
    ExecutionEnvironment="multi-gpu");

For more information, see trainingOptions.

Checkpoint Options

For large networks and large datasets, training can take a long time to run. To periodically save the network during training, you can save checkpoint networks. For example, to save a checkpoint network every 5 epochs in the folder named "checkpoints", use

options = trainingOptions("adam", ...
    CheckpointPath="checkpoints", ...
    CheckpointFrequency=5);

If the training is interrupted for some reason, you can resume training from the last saved checkpoint neural network.

For more information, see trainingOptions.