Custom Layer Function Acceleration
If you do not specify a backward function when you define a custom layer, then the software automatically determines the gradients using automatic differentiation.
When you train a network with a custom layer without a backward function, the software traces
each input dlarray
object of the custom layer forward function to determine
the computation graph used for automatic differentiation. This tracing process can take some
time and can end up recomputing the same trace. By optimizing, caching, and reusing the
traces, you can speed up gradient computation when training a network. The software can also
reuse these traces to speed up network predictions after training.
The trace depends on the size, format, and underlying data type of the layer inputs. That is, the layer triggers a new trace for inputs with a size, format, or underlying data type not contained in the cache. Inputs that differ only by value to a previously cached trace do not trigger a new trace.
To indicate that the custom layer supports acceleration, also inherit from the nnet.layer.Acceleratable
class when defining the custom layer. When a custom layer inherits from nnet.layer.Acceleratable
, the software automatically caches traces when passing data through a dlnetwork
object.
For example, to indicate that the custom layer myLayer
supports
acceleration, use this
syntax:
classdef myLayer < nnet.layer.Layer & nnet.layer.Acceleratable ... end
Acceleration Considerations
Because of the nature of caching traces, not all functions support acceleration.
The caching process can cache values or code structures that you might expect to change or that depend on external factors. You must take care when accelerating custom layers that:
Generate random numbers.
Use
if
statements andwhile
loops with conditions that depend on the values ofdlarray
objects.
Tip
If custom layer acceleration causes a slowdown or if your custom layer does not support acceleration, you might be able to speed up training by specifying a backward function for your layer. For more information, see Specify Custom Layer Backward Function.
Because the caching process requires extra computation, acceleration can lead to longer running code in some cases. This scenario can happen when the software spends time creating new caches that do not get reused often. For example, when you pass multiple mini-batches of different sequence lengths to the function, the software triggers a new trace for each unique sequence length.
When custom layer acceleration causes a slowdown, you can disable acceleration by removing
the Acceleratable
class or by setting the Acceleration
options of the predict
and forward
dlnetwork
object functions to "none"
.
Functions with Random Number Generation
You must take care when you accelerate functions that use random number generation, such as
functions that generate random noise to add to the input. When the software caches
the trace of a function that generates random numbers that are not
dlarray
objects, the software caches the resulting random
samples in the trace. When reusing the trace, the accelerated function uses the
cached random sample. The accelerated function does not generate new random
values.
Random number generation using the "like"
option of the rand
function with a dlarray
object supports acceleration. To use random number generation in an accelerated function, ensure that the function uses the rand
function with the "like"
option set to a traced dlarray
object (a dlarray
object that depends on an input dlarray
object).
For example, consider the following layer predict function, which adds random noise to the input.
function Y = predict(layer,X) sz = size(X); noise = rand(sz); Y = X + noise; end
To ensure that the rand
function generates a new value for
each evaluation, use the "like"
option with the traced
dlarray
object
X
.
function Y = predict(layer,X) sz = size(X); noise = rand(sz,"like",X); Y = X + noise; end
Functions with if
Statements and while
Loops
You must take care when you accelerate functions that use if
statements and
while
loops. In particular, you can get unexpected results when you
accelerate functions with if
statements or while
loops
that yield different code paths for function inputs of the same size and format.
Accelerating functions with if
statement or while
loop conditions that depend on the values of the function input or values from external
sources (for example, results of random number generation) can lead to unexpected behavior.
When the accelerated function caches a new trace, if the function contains an
if
statement or while
loop, then the software
caches the trace of the resulting code path given by the if
statement or
while
loop condition for that particular trace. Because changes in
the value of the dlarray
input do not trigger a new trace, when reusing the
trace with different values, the software uses the same cached trace (which contains the
same cached code path) even when a difference in value should result in a different code
path.
Usually, accelerating functions that contain if
statements or
while
loops with conditions that do not depend on the values of the
function input or external factors (for example, while
loops that iterate
over elements in an array) does not result in unexpected behavior. For example, because
changes in the size of a dlarray
input trigger a new trace, when reusing
the trace with inputs of the same size, the cached code path for inputs of that size remain
consistent, even when there are differences in values.
To avoid unexpected behavior from caching code paths of if
statements,
you can refactor your code so that it determines the correct result by combining the results
of all branches and extracting the desired solution.
For example, consider this code.
if tf Y = funcA(X); else Y = funcB(X); end
Y = tf*funcA(X) + ~tf*funcB(X);
Y = cat(3,funcA(X),funcB(X)); Y = Y(:,:,[tf ~tf]);
if
statement.dlode45
Does Not Support Acceleration When GradientMode
Is "direct"
The software does not support accelerating the dlode45
function when the GradientMode
option is
"direct"
. The resulting layer output might return unexpected
results. To accelerate the code that calls the dlode45
function, set the GradientMode
option to
"adjoint"
.
See Also
Related Topics
- Define Custom Deep Learning Layers
- Define Custom Deep Learning Layer with Learnable Parameters
- Define Custom Deep Learning Layer with Multiple Inputs
- Define Custom Deep Learning Layer with Formatted Inputs
- Define Custom Recurrent Deep Learning Layer
- Define Nested Deep Learning Layer Using Network Composition
- Specify Custom Layer Backward Function
- Define Custom Deep Learning Layer for Code Generation
- Check Custom Layer Validity