Define Custom Deep Learning Layer with Multiple Inputs
If Deep Learning Toolbox™ does not provide the layer you require for your classification or regression problem, then you can define your own custom layer using this example as a guide. For a list of built-in layers, see List of Deep Learning Layers.
To define a custom deep learning layer, you can use the template provided in this example, which takes you through these steps:
Name the layer — Give the layer a name so that you can use it in MATLAB®.
Declare the layer properties — Specify the properties of the layer, including learnable parameters and state parameters.
Create a constructor function (optional) — Specify how to construct the layer and initialize its properties. If you do not specify a constructor function, then at creation, the software initializes the
Name
,Description
, andType
properties with[]
and sets the number of layer inputs and outputs to1
.Create initialize function (optional) — Specify how to initialize the learnable and state parameters when the software initializes the network. If you do not specify an initialize function, then the software does not initialize parameters when it initializes the network.
Create forward functions — Specify how data passes forward through the layer (forward propagation) at prediction time and at training time.
Create reset state function (optional) — Specify how to reset state parameters.
Create a backward function (optional) — Specify the derivatives of the loss with respect to the input data and the learnable parameters (backward propagation). If you do not specify a backward function, then the forward functions must support
dlarray
objects.
This example shows how to create a weighted addition layer, which is a layer with multiple inputs and learnable parameter, and use it in a convolutional neural network. A weighted addition layer scales and adds inputs from multiple neural network layers element-wise.
Custom Layer Template
Copy the custom layer template into a new file in MATLAB. This template gives the structure of a layer class definition. It outlines:
The optional
properties
blocks for the layer properties, learnable parameters, and state parameters.The layer constructor function.
The optional
initialize
function.The
predict
function and the optionalforward
function.The optional
resetState
function for layers with state properties.The optional
backward
function.
classdef myLayer < nnet.layer.Layer % ... % & nnet.layer.Formattable ... % (Optional) % & nnet.layer.Acceleratable % (Optional) properties % (Optional) Layer properties. % Declare layer properties here. end properties (Learnable) % (Optional) Layer learnable parameters. % Declare learnable parameters here. end properties (State) % (Optional) Layer state parameters. % Declare state parameters here. end properties (Learnable, State) % (Optional) Nested dlnetwork objects with both learnable % parameters and state parameters. % Declare nested networks with learnable and state parameters here. end methods function layer = myLayer() % (Optional) Create a myLayer. % This function must have the same name as the class. % Define layer constructor function here. end function layer = initialize(layer,layout) % (Optional) Initialize layer learnable and state parameters. % % Inputs: % layer - Layer to initialize % layout - Data layout, specified as a networkDataLayout % object % % Outputs: % layer - Initialized layer % % - For layers with multiple inputs, replace layout with % layout1,...,layoutN, where N is the number of inputs. % Define layer initialization function here. end function [Y,state] = predict(layer,X) % Forward input data through the layer at prediction time and % output the result and updated state. % % Inputs: % layer - Layer to forward propagate through % X - Input data % Outputs: % Y - Output of layer forward function % state - (Optional) Updated layer state % % - For layers with multiple inputs, replace X with X1,...,XN, % where N is the number of inputs. % - For layers with multiple outputs, replace Y with % Y1,...,YM, where M is the number of outputs. % - For layers with multiple state parameters, replace state % with state1,...,stateK, where K is the number of state % parameters. % Define layer predict function here. end function [Y,state,memory] = forward(layer,X) % (Optional) Forward input data through the layer at training % time and output the result, the updated state, and a memory % value. % % Inputs: % layer - Layer to forward propagate through % X - Layer input data % Outputs: % Y - Output of layer forward function % state - (Optional) Updated layer state % memory - (Optional) Memory value for custom backward % function % % - For layers with multiple inputs, replace X with X1,...,XN, % where N is the number of inputs. % - For layers with multiple outputs, replace Y with % Y1,...,YM, where M is the number of outputs. % - For layers with multiple state parameters, replace state % with state1,...,stateK, where K is the number of state % parameters. % Define layer forward function here. end function layer = resetState(layer) % (Optional) Reset layer state. % Define reset state function here. end function [dLdX,dLdW,dLdSin] = backward(layer,X,Y,dLdY,dLdSout,memory) % (Optional) Backward propagate the derivative of the loss % function through the layer. % % Inputs: % layer - Layer to backward propagate through % X - Layer input data % Y - Layer output data % dLdY - Derivative of loss with respect to layer % output % dLdSout - (Optional) Derivative of loss with respect % to state output % memory - Memory value from forward function % Outputs: % dLdX - Derivative of loss with respect to layer input % dLdW - (Optional) Derivative of loss with respect to % learnable parameter % dLdSin - (Optional) Derivative of loss with respect to % state input % % - For layers with state parameters, the backward syntax must % include both dLdSout and dLdSin, or neither. % - For layers with multiple inputs, replace X and dLdX with % X1,...,XN and dLdX1,...,dLdXN, respectively, where N is % the number of inputs. % - For layers with multiple outputs, replace Y and dLdY with % Y1,...,YM and dLdY,...,dLdYM, respectively, where M is the % number of outputs. % - For layers with multiple learnable parameters, replace % dLdW with dLdW1,...,dLdWP, where P is the number of % learnable parameters. % - For layers with multiple state parameters, replace dLdSin % and dLdSout with dLdSin1,...,dLdSinK and % dLdSout1,...,dldSoutK, respectively, where K is the number % of state parameters. % Define layer backward function here. end end end
Name Layer and Specify Superclasses
First, give the layer a name. In the first line of the class file, replace the
existing name myLayer
with
weightedAdditionLayer
.
classdef weightedAdditionLayer < nnet.layer.Layer % ... % & nnet.layer.Formattable ... % (Optional) % & nnet.layer.Acceleratable % (Optional) ... end
If you do not specify a backward function, then the layer functions, by default, receive
unformatted
dlarray
objects as input. To specify that the layer receives
formatted
dlarray
objects as input and also outputs formatted
dlarray
objects, also inherit from the
nnet.layer.Formattable
class when defining the custom layer.
The layer functions support acceleration, so also inherit from
nnet.layer.Acceleratable
. For more information about accelerating
custom layer functions, see Custom Layer Function Acceleration. The layer does not
require formattable inputs, so remove the optional
nnet.layer.Formattable
superclass.
classdef weightedAdditionLayer < nnet.layer.Layer ... & nnet.layer.Acceleratable ... end
Next, rename the myLayer
constructor function (the first function
in the methods
section) so that it has the same name as the
layer.
methods function layer = weightedAdditionLayer() ... end ... end
Save the Layer
Save the layer class file in a new file named
weightedAdditionLayer.m
. The file name must match the layer
name. To use the layer, you must save the file in the current folder or in a folder
on the MATLAB path.
Declare Properties and Learnable Parameters
Declare the layer properties in the properties
section and declare
learnable parameters by listing them in the properties (Learnable)
section.
By default, custom layers have these properties. Do not declare these properties in the
properties
section.
Property | Description |
---|---|
Name | Layer name, specified as a character vector or a string scalar.
For Layer array input, the trainnet and
dlnetwork functions automatically assign
names to layers with the name "" . |
Description | One-line description of the layer, specified as a string scalar or a character vector. This
description appears when the layer is displayed in a If you do not specify a layer description, then the software displays the layer class name. |
Type | Type of the layer, specified as a character vector or a string scalar. The value of If you do not specify a layer type, then the software displays the layer class name. |
NumInputs | Number of inputs of the layer, specified as a positive integer. If
you do not specify this value, then the software automatically sets
NumInputs to the number of names in
InputNames . The default value is 1. |
InputNames | Input names of the layer, specified as a cell array of character
vectors. If you do not specify this value and
NumInputs is greater than 1, then the software
automatically sets InputNames to
{'in1',...,'inN'} , where N is
equal to NumInputs . The default value is
{'in'} . |
NumOutputs | Number of outputs of the layer, specified as a positive integer. If
you do not specify this value, then the software automatically sets
NumOutputs to the number of names in
OutputNames . The default value is 1. |
OutputNames | Output names of the layer, specified as a cell array of character
vectors. If you do not specify this value and
NumOutputs is greater than 1, then the software
automatically sets OutputNames to
{'out1',...,'outM'} , where M
is equal to NumOutputs . The default value is
{'out'} . |
If the layer has no other properties, then you can omit the properties
section.
Tip
If you are creating a layer with multiple inputs, then you must
set either the NumInputs
or InputNames
properties in the
layer constructor. If you are creating a layer with multiple outputs, then you must set either
the NumOutputs
or OutputNames
properties in the layer
constructor.
A weighted addition layer does not require any additional properties, so you can
remove the properties
section.
A weighted addition layer has only one learnable parameter, the weights. Declare this
learnable parameter in the properties (Learnable)
section and call
the parameter Weights
.
properties (Learnable)
% Layer learnable parameters
% Scaling coefficients
Weights
end
Create Constructor Function
Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.
The weighted addition layer constructor function requires two inputs: the number of
inputs to the layer and the layer name. This number of inputs to the layer specifies the
size of the learnable parameter Weights
. Specify two input arguments
named numInputs
and name
in the
weightedAdditionLayer
function. Add a comment to the top of the
function that explains the syntax of the function.
function layer = weightedAdditionLayer(numInputs,name) % layer = weightedAdditionLayer(numInputs,name) creates a % weighted addition layer and specifies the number of inputs % and the layer name. ... end
Initialize Layer Properties
Initialize the layer properties, including learnable parameters, in the
constructor function. Replace the comment % Layer constructor function goes
here
with code that initializes the layer properties.
Set the NumInputs
property to the input argument
numInputs
.
% Set number of inputs.
layer.NumInputs = numInputs;
Set the Name
property to the input argument
name
.
% Set layer name.
layer.Name = name;
Give the layer a one-line description by setting the
Description
property of the layer. Set the description to
describe the type of layer and its size.
% Set layer description. layer.Description = "Weighted addition of " + numInputs + ... " inputs";
A weighted addition layer multiplies each layer input by the corresponding
coefficient in Weights
and adds the resulting values together.
Initialize the learnable parameter Weights
to be a random vector
of size 1-by-numInputs
. Weights
is a property
of the layer object, so you must assign the vector to
layer.Weights
.
% Initialize layer weights
layer.Weights = rand(1,numInputs);
View the completed constructor function.
function layer = weightedAdditionLayer(numInputs,name)
% layer = weightedAdditionLayer(numInputs,name) creates a
% weighted addition layer and specifies the number of inputs
% and the layer name.
% Set number of inputs.
layer.NumInputs = numInputs;
% Set layer name.
layer.Name = name;
% Set layer description.
layer.Description = "Weighted addition of " + numInputs + ...
" inputs";
% Initialize layer weights.
layer.Weights = rand(1,numInputs);
end
With this constructor function, the command
weightedAdditionLayer(3,'add')
creates a weighted addition
layer with three inputs and the name 'add'
.
Because the constructor function does not require information from the layer input
data to initialize the learnable parameters, defining the
initialize
function is optional. For layers that require
information from the input data to initialize the learnable parameters, for example,
the weights of a SReLU layer must have the same number of channels as the input
data, you can implement a custom initialize
function. For an
example, see Define Custom Deep Learning Layer with Learnable Parameters.
Create Forward Functions
Create the layer forward functions to use at prediction time and training time.
Create a function named predict
that propagates the data forward
through the layer at prediction time and outputs the result.
The predict
function syntax depends on the type of layer.
Y = predict(layer,X)
forwards the input dataX
through the layer and outputs the resultY
, wherelayer
has a single input and a single output.[Y,state] = predict(layer,X)
also outputs the updated state parameterstate
, wherelayer
has a single state parameter.
You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:
For layers with multiple inputs, replace
X
withX1,...,XN
, whereN
is the number of inputs. TheNumInputs
property must matchN
.For layers with multiple outputs, replace
Y
withY1,...,YM
, whereM
is the number of outputs. TheNumOutputs
property must matchM
.For layers with multiple state parameters, replace
state
withstate1,...,stateK
, whereK
is the number of state parameters.
Tip
If the number of inputs to the layer can vary, then use varargin
instead of X1,…,XN
. In this case, varargin
is a cell array of the inputs, where varargin{i}
corresponds to Xi
.
If the number of outputs can vary, then use varargout
instead of Y1,…,YM
. In this case, varargout
is a cell array of the outputs, where varargout{j}
corresponds to Yj
.
Tip
If the custom layer has a dlnetwork
object for a learnable parameter, then in
the predict
function of the custom layer, use the
predict
function for the dlnetwork
. When you do
so, the dlnetwork
object predict
function uses the
appropriate layer operations for prediction. If the dlnetwork
has state
parameters, then also return the network state.
Because a weighted addition layer has only one output and a variable number of inputs,
the syntax for predict
for a weighted addition layer is Y =
predict(layer,varargin)
, where varargin{i}
corresponds
to Xi
for positive integers i
less than or equal
to NumInputs
.
By default, the layer uses predict
as the forward function at
training time. To use a different forward function at training time, or retain a value
required for the backward function, you must also create a function named
forward
.
The dimensions of the inputs depend on the type of data and the output of the connected layers:
Layer Input | Example | |
---|---|---|
Shape | Data Format | |
2-D images |
h-by-w-by-c-by-N numeric array, where h, w, c and N are the height, width, number of channels of the images, and number of observations, respectively. | "SSCB" |
3-D images | h-by-w-by-d-by-c-by-N numeric array, where h, w, d, c and N are the height, width, depth, number of channels of the images, and number of image observations, respectively. | "SSSCB" |
Vector sequences |
c-by-N-by-s matrix, where c is the number of features of the sequence, N is the number of sequence observations, and s is the sequence length. | "CBT" |
2-D image sequences |
h-by-w-by-c-by-N-by-s array, where h, w, and c correspond to the height, width, and number of channels of the image, respectively, N is the number of image sequence observations, and s is the sequence length. | "SSCBT" |
3-D image sequences |
h-by-w-by-d-by-c-by-N-by-s array, where h, w, d, and c correspond to the height, width, depth, and number of channels of the image, respectively, N is the number of image sequence observations, and s is the sequence length. | "SSSCBT" |
Features | c-by-N array, where c is the number of features, and N is the number of observations. | "CB" |
For layers that output sequences, the layers can output sequences of any length or output data with no time dimension.
The forward
function propagates the data forward through the layer
at training time and also outputs a memory value.
The forward
function syntax depends on the type of layer:
Y = forward(layer,X)
forwards the input dataX
through the layer and outputs the resultY
, wherelayer
has a single input and a single output.[Y,state] = forward(layer,X)
also outputs the updated state parameterstate
, wherelayer
has a single state parameter.[__,memory] = forward(layer,X)
also returns a memory value for a custombackward
function using any of the previous syntaxes. If the layer has both a customforward
function and a custombackward
function, then the forward function must return a memory value.
You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:
For layers with multiple inputs, replace
X
withX1,...,XN
, whereN
is the number of inputs. TheNumInputs
property must matchN
.For layers with multiple outputs, replace
Y
withY1,...,YM
, whereM
is the number of outputs. TheNumOutputs
property must matchM
.For layers with multiple state parameters, replace
state
withstate1,...,stateK
, whereK
is the number of state parameters.
Tip
If the number of inputs to the layer can vary, then use varargin
instead of X1,…,XN
. In this case, varargin
is a cell array of the inputs, where varargin{i}
corresponds to Xi
.
If the number of outputs can vary, then use varargout
instead of Y1,…,YM
. In this case, varargout
is a cell array of the outputs, where varargout{j}
corresponds to Yj
.
Tip
If the custom layer has a dlnetwork
object for a learnable parameter, then in
the forward
function of the custom layer, use the
forward
function of the dlnetwork
object. When you
do so, the dlnetwork
object forward
function uses the
appropriate layer operations for training.
The forward function of a weighted addition layer is
where X(1), …, X(n) correspond to the layer inputs and W1,…,Wn are the layer weights.
Implement the forward function in predict
. In
predict
, the output Y
corresponds to . The weighted addition layer does not require memory or a different
forward function for training, so you can remove the forward
function
from the class file. Add a comment to the top of the function that explains the syntaxes
of the function.
Tip
If you preallocate arrays using functions such as
zeros
, then you must ensure that the data types of these arrays are
consistent with the layer function inputs. To create an array of zeros of the same data type as
another array, use the "like"
option of zeros
. For
example, to initialize an array of zeros of size sz
with the same data type
as the array X
, use Y = zeros(sz,"like",X)
.
function Y = predict(layer, varargin)
% Y = predict(layer, X1, ..., Xn) forwards the input data X1,
% ..., Xn through the layer and outputs the result Y.
X = varargin;
W = layer.Weights;
% Initialize output
X1 = X{1};
sz = size(X1);
Y = zeros(sz,'like',X1);
% Weighted addition
for i = 1:layer.NumInputs
Y = Y + W(i)*X{i};
end
end
Because the predict
function uses only functions that support
dlarray
objects, defining the backward
function is
optional. For a list of functions that support dlarray
objects, see List of Functions with dlarray Support.
Completed Layer
View the completed layer class file.
classdef weightedAdditionLayer < nnet.layer.Layer ... & nnet.layer.Acceleratable % Example custom weighted addition layer. properties (Learnable) % Layer learnable parameters % Scaling coefficients Weights end methods function layer = weightedAdditionLayer(numInputs,name) % layer = weightedAdditionLayer(numInputs,name) creates a % weighted addition layer and specifies the number of inputs % and the layer name. % Set number of inputs. layer.NumInputs = numInputs; % Set layer name. layer.Name = name; % Set layer description. layer.Description = "Weighted addition of " + numInputs + ... " inputs"; % Initialize layer weights. layer.Weights = rand(1,numInputs); end function Y = predict(layer, varargin) % Y = predict(layer, X1, ..., Xn) forwards the input data X1, % ..., Xn through the layer and outputs the result Y. X = varargin; W = layer.Weights; % Initialize output X1 = X{1}; sz = size(X1); Y = zeros(sz,'like',X1); % Weighted addition for i = 1:layer.NumInputs Y = Y + W(i)*X{i}; end end end end
GPU Compatibility
If the layer forward functions fully support dlarray
objects, then the layer
is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs
and return outputs of type gpuArray
(Parallel Computing Toolbox).
Many MATLAB built-in functions support gpuArray
(Parallel Computing Toolbox) and dlarray
input arguments. For a list of
functions that support dlarray
objects, see List of Functions with dlarray Support. For a list of functions
that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep
learning, you must also have a supported GPU device. For information on supported devices, see
GPU Computing Requirements (Parallel Computing Toolbox). For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).
In this example, the MATLAB functions used in predict
all support
dlarray
objects, so the layer is GPU compatible.
Check Validity of Layer with Multiple Inputs
Check the layer validity of the custom layer weightedAdditionLayer
.
Create an instance of the layer weightedAdditionLayer
, attached to this example as a supporting file, and check its validity using checkLayer
. Specify the valid input sizes to be the typical sizes of a single observation for each input to the layer. The layer expects 4-D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.
Specify the typical size of the input of an observation and set 'ObservationDimension'
to 4.
layer = weightedAdditionLayer(2,'add'); validInputSize = {[24 24 20],[24 24 20]}; checkLayer(layer,validInputSize,'ObservationDimension',4)
Skipping initialization tests. The layer does not have an initialize function. Skipping GPU tests. No compatible GPU device found. Skipping code generation compatibility tests. To check validity of the layer for code generation, specify the CheckCodegenCompatibility and ObservationDimension options. Running nnet.checklayer.TestLayerWithoutBackward .......... ........ Done nnet.checklayer.TestLayerWithoutBackward __________ Test Summary: 18 Passed, 0 Failed, 0 Incomplete, 16 Skipped. Time elapsed: 0.32994 seconds.
Here, the function does not detect any issues with the layer.
Use Custom Weighted Addition Layer in Network
You can use a custom layer in the same way as any other layer in Deep Learning Toolbox. This section shows how to create and train a network for digit classification using the weighted addition layer you created earlier.
Load the example training data.
[XTrain,TTrain] = digitTrain4DArrayData;
Create a dlnetwork
object.
net = dlnetwork;
Define the neural network architecture including the custom layer weightedAdditionLayer
, attached to this example as a supporting file.
layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) reluLayer('Name',"relu1") convolution2dLayer(3,20,'Padding',1) reluLayer convolution2dLayer(3,20,'Padding',1) reluLayer weightedAdditionLayer(2,"add") fullyConnectedLayer(10) softmaxLayer]; net = addLayers(net,layers); net = connectLayers(net,"relu1","add/in2");
Specify the training options. Choosing among the options requires empirical analysis. To explore different training option configurations by running experiments, you can use the Experiment Manager app.
Train using the Adam optimizer.
Train for 10 epochs.
Monitor the accuracy.
options = trainingOptions("adam",'MaxEpochs',10,'Metrics',"accuracy");
Train the neural network using the trainnet
function. For classification, use cross-entropy loss. By default, the trainnet
function uses a GPU if one is available. Training on a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Otherwise, the trainnet
function uses the CPU. To specify the execution environment, use the ExecutionEnvironment
training option.
net = trainnet(XTrain,TTrain,net,"crossentropy",options);
Iteration Epoch TimeElapsed LearnRate TrainingLoss TrainingAccuracy _________ _____ ___________ _________ ____________ ________________ 1 1 00:00:00 0.001 2.3021 12.5 50 2 00:00:08 0.001 0.89936 67.969 100 3 00:00:15 0.001 0.34073 87.5 150 4 00:00:22 0.001 0.10776 98.438 200 6 00:00:29 0.001 0.042476 99.219 250 7 00:00:38 0.001 0.019208 100 300 8 00:00:44 0.001 0.023558 99.219 350 9 00:00:52 0.001 0.0053888 100 390 10 00:01:02 0.001 0.007815 100 Training stopped: Max epochs completed
View the weights learned by the weighted addition layer.
net.Layers(8).Weights
ans = 1x2 single row vector
1.0270 0.9974
Evaluate the network performance by predicting on new data and calculating the accuracy. To make predictions with multiple observations, use the minibatchpredict
function. To convert the prediction scores to labels, use the scores2label
function. The minibatchpredict
function automatically uses a GPU if one is available.
[XTest,TTest] = digitTest4DArrayData; classNames = categories(TTest); scores = minibatchpredict(net,XTest); YPred = scores2label(scores,classNames); accuracy = mean(TTest==YPred)
accuracy = 0.9874
See Also
trainnet
| trainingOptions
| dlnetwork
| functionLayer
| checkLayer
| setLearnRateFactor
| setL2Factor
| getLearnRateFactor
| getL2Factor
| findPlaceholderLayers
| replaceLayer
| PlaceholderLayer
| networkDataLayout
Related Topics
- Define Custom Deep Learning Layers
- Define Custom Deep Learning Layer with Learnable Parameters
- Define Custom Deep Learning Layer with Formatted Inputs
- Define Custom Recurrent Deep Learning Layer
- Specify Custom Layer Backward Function
- Define Custom Deep Learning Layer for Code Generation
- Define Nested Deep Learning Layer Using Network Composition
- Check Custom Layer Validity