# gradient

Evaluate gradient of function approximator object given observation and action input data

*Since R2022a*

## Syntax

## Description

## Examples

### Calculate Gradients for Continuous Gaussian Actor

Create observation and action specification objects (or alternatively use `getObservationInfo`

and `getActionInfo`

to extract the specification objects from an environment). For this example, define an observation space with of three channels. The first channel carries an observation from a continuous three-dimensional space, so that a single observation is a column vector containing three doubles. The second channel carries a discrete observation made of a two-dimensional row vector that can take one of five different values. The third channel carries a discrete scalar observation that can be either zero or one. Finally, the action space is a continuous four-dimensional space, so that a single action is a column vector containing four doubles, each between `-10`

and `10`

.

obsInfo = [rlNumericSpec([3 1]) rlFiniteSetSpec({[1 2],[3 4],[5 6],[7 8],[9 10]}) rlFiniteSetSpec([0 1])]; actInfo = rlNumericSpec([4 1], ... UpperLimit= 10*ones(4,1), ... LowerLimit=-10*ones(4,1) );

To approximate the policy within the actor, use a recurrent deep neural network. For a continuous Gaussian actor, the network must have two output layers (one for the mean values the other for the standard deviation values), each having as many elements as the dimension of the action space.

Create a the network, defining each path as an array of layer objects. Use `sequenceInputLayer`

as the input layer and include an `lstmLayer`

as one of the other network layers. Also use a softplus layer to enforce nonnegativity of the standard deviations and a ReLU layer to scale the mean values to the desired output range. Get the dimensions of the observation and action spaces from the environment specification objects, and specify a name for the input layers, so you can later explicitly associate them with the appropriate environment channel.

inPath1 = [ sequenceInputLayer( ... prod(obsInfo(1).Dimension), ... Name="netObsIn1") fullyConnectedLayer(prod(actInfo.Dimension), ... Name="infc1") ]; inPath2 = [ sequenceInputLayer( ... prod(obsInfo(2).Dimension), ... Name="netObsIn2") fullyConnectedLayer( ... prod(actInfo.Dimension), ... Name="infc2") ]; inPath3 = [ sequenceInputLayer( ... prod(obsInfo(3).Dimension), ... Name="netObsIn3") fullyConnectedLayer( ... prod(actInfo.Dimension), ... Name="infc3") ]; % Concatenate inputs along the first available dimension jointPath = [ concatenationLayer(1,3,Name="cat") tanhLayer(Name="tanhJnt"); lstmLayer(8,OutputMode="sequence",Name="lstm") fullyConnectedLayer( ... prod(actInfo.Dimension), ... Name="jntfc"); ]; % Path layers for mean value % Using scalingLayer to scale range from (-1,1) to (-10,10) meanPath = [ tanhLayer(Name="tanhMean"); fullyConnectedLayer(prod(actInfo.Dimension)); scalingLayer(Name="scale", ... Scale=actInfo.UpperLimit) ]; % Path layers for standard deviations % Using softplus layer to make them nonnegative sdevPath = [ tanhLayer(Name="tanhStdv"); fullyConnectedLayer(prod(actInfo.Dimension)); softplusLayer(Name="splus") ]; % Add layers to network object net = layerGraph; net = addLayers(net,inPath1); net = addLayers(net,inPath2); net = addLayers(net,inPath3); net = addLayers(net,jointPath); net = addLayers(net,meanPath); net = addLayers(net,sdevPath); % Connect layers net = connectLayers(net,"infc1","cat/in1"); net = connectLayers(net,"infc2","cat/in2"); net = connectLayers(net,"infc3","cat/in3"); net = connectLayers(net,"jntfc","tanhMean/in"); net = connectLayers(net,"jntfc","tanhStdv/in"); % Plot network plot(net)

% Convert to dlnetwork net = dlnetwork(net); % Display the number of weights summary(net)

Initialized: true Number of learnables: 784 Inputs: 1 'netObsIn1' Sequence input with 3 dimensions 2 'netObsIn2' Sequence input with 2 dimensions 3 'netObsIn3' Sequence input with 1 dimensions

Create the actor with `rlContinuousGaussianActor`

, using the network, the observations and action specification objects, as well as the names of the network input layer and the options object.

actor = rlContinuousGaussianActor(net, obsInfo, actInfo, ... ActionMeanOutputNames="scale",... ActionStandardDeviationOutputNames="splus",... ObservationInputNames=["netObsIn1","netObsIn2","netObsIn3"]);

To return mean value and standard deviations of the Gaussian distribution as a function of the current observation, use `evaluate`

.

[prob,state] = evaluate(actor, {rand([obsInfo(1).Dimension 1 1]) , ... rand([obsInfo(2).Dimension 1 1]) , ... rand([obsInfo(3).Dimension 1 1]) });

The result is a cell array with two elements, the first one containing a vector of mean values, and the second containing a vector of standard deviations.

prob{1}

`ans = `*4x1 single column vector*
-1.5454
0.4908
-0.1697
0.8081

prob{2}

`ans = `*4x1 single column vector*
0.6913
0.6141
0.7291
0.6475

To return an action sampled from the distribution, use `getAction`

.

act = getAction(actor, {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(obsInfo(3).Dimension) }); act{1}

`ans = `*4x1 single column vector*
-3.2003
-0.0534
-1.0700
-0.4032

Calculate the gradients of the sum of the outputs (all the mean values plus all the standard deviations) with respect to the inputs, given a random observation.

gro = gradient(actor,"output-input", ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(obsInfo(3).Dimension)} )

`gro=`*3×1 cell array*
{3x1 single}
{2x1 single}
{[ 0.1311]}

The result is a cell array with as many elements as the number of input channels. Each element contains the derivatives of the sum of the outputs with respect to each component of the input channel. Display the gradient with respect to the element of the second channel.

gro{2}

`ans = `*2x1 single column vector*
-1.3404
0.6642

Obtain the gradient with respect of five independent sequences, each one made of nine sequential observations.

gro_batch = gradient(actor,"output-input", ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) , ... rand([obsInfo(3).Dimension 5 9])} )

`gro_batch=`*3×1 cell array*
{3x5x9 single}
{2x5x9 single}
{1x5x9 single}

Display the derivative of the sum of the outputs with respect to the third observation element of the first input channel, after the seventh sequential observation in the fourth independent batch.

gro_batch{1}(3,4,7)

`ans = `*single*
0.2020

Set the option to accelerate the gradient computations.

actor = accelerate(actor,true);

Calculate the gradients of the sum of the outputs with respect to the parameters, given a random observation.

grp = gradient(actor,"output-parameters", ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(obsInfo(3).Dimension)} )

`grp=`*15×1 cell array*
{ 4x3 single}
{ 4x1 single}
{ 4x2 single}
{ 4x1 single}
{ 4x1 single}
{ 4x1 single}
{32x12 single}
{32x8 single}
{32x1 single}
{ 4x8 single}
{ 4x1 single}
{ 4x4 single}
{ 4x1 single}
{ 4x4 single}
{ 4x1 single}

Each array within a cell contains the gradient of the sum of the outputs with respect to a group of parameters.

grp_batch = gradient(actor,"output-parameters", ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) , ... rand([obsInfo(3).Dimension 5 9])} )

`grp_batch=`*15×1 cell array*
{ 4x3 single}
{ 4x1 single}
{ 4x2 single}
{ 4x1 single}
{ 4x1 single}
{ 4x1 single}
{32x12 single}
{32x8 single}
{32x1 single}
{ 4x8 single}
{ 4x1 single}
{ 4x4 single}
{ 4x1 single}
{ 4x4 single}
{ 4x1 single}

If you use a batch of inputs, `gradient`

uses the whole input sequence (in this case nine steps), and all the gradients with respect to the independent batch dimensions (in this case five) are added together. Therefore, the returned gradient has always the same size as the output from `getLearnableParameters`

.

### Calculate Gradients for Vector Q-Value Function

Create observation and action specification objects (or alternatively use `getObservationInfo`

and `getActionInfo`

to extract the specification objects from an environment). For this example, define an observation space made of two channels. The first channel carries an observation from a continuous four-dimensional space. The second carries a discrete scalar observation that can be either zero or one. Finally, the action space consist of a scalar that can be `-1`

, `0`

, or `1`

.

obsInfo = [rlNumericSpec([4 1]) rlFiniteSetSpec([0 1])]; actInfo = rlFiniteSetSpec([-1 0 1]);

To approximate the vector Q-value function within the critic, use a recurrent deep neural network. The output layer must have three elements, each one expressing the value of executing the corresponding action, given the observation.

Create the neural network, defining each network path as an array of layer objects. Get the dimensions of the observation and action spaces from the environment specification objects, use `sequenceInputLayer`

as the input layer, and include an `lstmLayer`

as one of the other network layers.

inPath1 = [ sequenceInputLayer( ... prod(obsInfo(1).Dimension), ... Name="netObsIn1") fullyConnectedLayer( ... prod(actInfo.Dimension), ... Name="infc1") ]; inPath2 = [ sequenceInputLayer( ... prod(obsInfo(2).Dimension), ... Name="netObsIn2") fullyConnectedLayer( ... prod(actInfo.Dimension), ... Name="infc2") ]; % Concatenate inputs along first available dimension jointPath = [ concatenationLayer(1,2,Name="cct") tanhLayer(Name="tanhJnt") lstmLayer(8,OutputMode="sequence") fullyConnectedLayer(prod(numel(actInfo.Elements))) ]; % Add layers to network object net = layerGraph; net = addLayers(net,inPath1); net = addLayers(net,inPath2); net = addLayers(net,jointPath); % Connect layers net = connectLayers(net,"infc1","cct/in1"); net = connectLayers(net,"infc2","cct/in2"); % Plot network plot(net)

% Convert to dlnetwork net = dlnetwork(net); % Display the number of weights summary(net)

Initialized: true Number of learnables: 386 Inputs: 1 'netObsIn1' Sequence input with 4 dimensions 2 'netObsIn2' Sequence input with 1 dimensions

Create the critic with `rlVectorQValueFunction`

, using the network and the observation and action specification objects.

critic = rlVectorQValueFunction(net,obsInfo,actInfo);

To return the value of the actions as a function of the current observation, use `getValue`

or `evaluate`

.

val = evaluate(critic, ... {rand(obsInfo(1).Dimension), ... rand(obsInfo(2).Dimension)})

`val = `*1x1 cell array*
{3x1 single}

When you use `evaluate`

, the result is a single-element cell array, containing a vector with the values of all the possible actions, given the observation.

val{1}

`ans = `*3x1 single column vector*
-0.0054
-0.0943
0.0177

Calculate the gradients of the sum of the outputs with respect to the inputs, given a random observation.

gro = gradient(critic,"output-input", ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) } )

`gro=`*2×1 cell array*
{4x1 single}
{[ -0.0396]}

The result is a cell array with as many elements as the number of input channels. Each element contains the derivative of the sum of the outputs with respect to each component of the input channel. Display the gradient with respect to the element of the second channel.

gro{2}

`ans = `*single*
-0.0396

Obtain the gradient with respect of five independent sequences each one made of nine sequential observations.

gro_batch = gradient(critic,"output-input", ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) } )

`gro_batch=`*2×1 cell array*
{4x5x9 single}
{1x5x9 single}

Display the derivative of the sum of the outputs with respect to the third observation element of the first input channel, after the seventh sequential observation in the fourth independent batch.

gro_batch{1}(3,4,7)

`ans = `*single*
0.0443

Set the option to accelerate the gradient computations.

critic = accelerate(critic,true);

Calculate the gradients of the sum of the outputs with respect to the parameters, given a random observation.

grp = gradient(critic,"output-parameters", ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) } )

`grp=`*9×1 cell array*
{[-0.0101 -0.0097 -0.0039 -0.0065]}
{[ -0.0122]}
{[ -0.0078]}
{[ -0.0863]}
{32x2 single }
{32x8 single }
{32x1 single }
{ 3x8 single }
{ 3x1 single }

Each array within a cell contains the gradient of the sum of the outputs with respect to a group of parameters.

grp_batch = gradient(critic,"output-parameters", ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) } )

`grp_batch=`*9×1 cell array*
{[-1.5801 -1.2618 -2.2168 -1.3463]}
{[ -3.7742]}
{[ -6.2158]}
{[ -12.6841]}
{32x2 single }
{32x8 single }
{32x1 single }
{ 3x8 single }
{ 3x1 single }

If you use a batch of inputs, `gradient`

uses the whole input sequence (in this case nine steps), and all the gradients with respect to the independent batch dimensions (in this case five) are added together. Therefore, the returned gradient always has the same size as the output from `getLearnableParameters`

.

## Input Arguments

`fcnAppx`

— Function approximator object

function approximator object

Function approximator object, specified as one of the following:

`rlValueFunction`

object — Value function critic`rlQValueFunction`

object — Q-value function critic`rlVectorQValueFunction`

object — Multi-output Q-value function critic with a discrete action space`rlContinuousDeterministicActor`

object — Deterministic policy actor with a continuous action space`rlDiscreteCategoricalActor`

— Stochastic policy actor with a discrete action space`rlContinuousGaussianActor`

object — Stochastic policy actor with a continuous action space`rlContinuousDeterministicTransitionFunction`

object — Continuous deterministic transition function for a model based agent`rlContinuousGaussianTransitionFunction`

object — Continuous Gaussian transition function for a model based agent`rlContinuousDeterministicRewardFunction`

object — Continuous deterministic reward function for a model based agent`rlContinuousGaussianRewardFunction`

object — Continuous Gaussian reward function for a model based agent.`rlIsDoneFunction`

object — Is-done function for a model based agent.

`inData`

— Input data for the function approximator

cell array

Input data for the function approximator, specified as a cell array with as many
elements as the number of input channels of `fcnAppx`

. In the
following section, the number of observation channels is indicated by
*N _{O}*.

If

`fcnAppx`

is an`rlQValueFunction`

, an`rlContinuousDeterministicTransitionFunction`

or an`rlContinuousGaussianTransitionFunction`

object, then each of the first*N*elements of_{O}`inData`

must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a final matrix representing the action.If

`fcnAppx`

is a function approximator object representing an actor or critic (but not an`rlQValueFunction`

object),`inData`

must contain*N*elements, each one being a matrix representing the current observation from the corresponding observation channel._{O}If

`fcnAppx`

is an`rlContinuousDeterministicRewardFunction`

, an`rlContinuousGaussianRewardFunction`

, or an`rlIsDoneFunction`

object, then each of the first*N*elements of_{O}`inData`

must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a matrix representing the action, and finally by*N*elements, each one being a matrix representing the next observation from the corresponding observation channel._{O}

Each element of `inData`

must be a matrix of dimension
*M _{C}*-by-

*L*-by-

_{B}*L*, where:

_{S}*M*corresponds to the dimensions of the associated input channel._{C}*L*is the batch size. To specify a single observation, set_{B}*L*= 1. To specify a batch of (independent) observations, specify_{B}*L*> 1. If_{B}`inData`

has multiple elements, then*L*must be the same for all elements of_{B}`inData`

.*L*specifies the sequence length (along the time dimension) for recurrent neural network. If_{S}`fcnAppx`

does not use a recurrent neural network, (which is the case of environment function approximators, as they do not support recurrent neural networks) then*L*= 1. If_{S}`inData`

has multiple elements, then*L*must be the same for all elements of_{S}`inData`

.

For more information on input and output formats for recurrent neural networks, see
the Algorithms section of `lstmLayer`

.

**Example: **`{rand(8,3,64,1),rand(4,1,64,1),rand(2,1,64,1)}`

`lossFcn`

— Loss function

function handle

Loss function, specified as a function handle to a user-defined function. The user
defined function can either be an anonymous function or a function on the MATLAB path.
The function first input parameter must be a cell array like the one returned from the
evaluation of `fcnAppx`

. For more information, see the description of
`outData`

in `evaluate`

. The
second, optional, input argument of `lossFcn`

contains additional
data that might be needed for the gradient calculation, as described below in
`fcnData`

. For an example of the signature that this function must
have, see Train Reinforcement Learning Policy Using Custom Training Loop.

`fcnData`

— Additional input data for loss function

any MATLAB^{®} data type

Additional data for the loss function, specified as any MATLAB data type, typically a structure or cell array. For an example see Train Reinforcement Learning Policy Using Custom Training Loop.

## Output Arguments

`grad`

— Value of the gradient

cell array

Value of the gradient, returned as a cell array.

When the type of gradient is from the sum of the outputs with respect to the inputs
of `fcnAppx`

, then `grad`

is a cell array in which
each element contains the gradient of the sum of all the outputs with respect to the
corresponding input channel.

The numerical array in each cell has dimensions
*D*-by-*L _{B}*-by-

*L*, where:

_{S}*D*corresponds to the dimensions of the input channel of`fcnAppx`

.*L*is the batch size (length of a batch of independent inputs)._{B}*L*is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If_{S}`fcnAppx`

does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then*L*= 1._{S}

When the type of gradient is from the output with respect to the parameters of
`fcnAppx`

, then `grad`

is a cell array in which
each element contains the gradient of the sum of outputs belonging to an output channel
with respect to the corresponding group of parameters. The gradient is calculated using
the whole history of *L _{S}* inputs, and all the

*L*gradients with respect to the independent input sequences are added together in

_{B}`grad`

. Therefore,
`grad`

has always the same size as the result from `getLearnableParameters`

.For more information on input and output formats for recurrent neural networks, see
the Algorithms section of `lstmLayer`

.

## Version History

**Introduced in R2022a**

## See Also

### Functions

### Objects

`rlValueFunction`

|`rlQValueFunction`

|`rlVectorQValueFunction`

|`rlContinuousDeterministicActor`

|`rlDiscreteCategoricalActor`

|`rlContinuousGaussianActor`

|`rlContinuousDeterministicTransitionFunction`

|`rlContinuousGaussianTransitionFunction`

|`rlContinuousDeterministicRewardFunction`

|`rlContinuousGaussianRewardFunction`

|`rlIsDoneFunction`

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)