gradient
Evaluate gradient of function approximator object given observation and action input data
Syntax
Description
Examples
Calculate Gradients for a Continuous Gaussian Actor
Create observation and action specification objects (or alternatively use getObservationInfo
and getActionInfo
to extract the specification objects from an environment). For this example, define the observation space as consisting of three channels. The first channel carries an observation from a continuous three-dimensional space, so that a single observation is a column vector containing 3 doubles. The second channel carries a discrete observation consisting of a two-dimensional row vector that can take one among five different values. The third channel carries a discrete scalar observation that can be either zero or one. Finally, the action space is a continuous four-dimensional space, so that a single action is a column vector containing 4 doubles each between -10
and 10
.
obsInfo = [rlNumericSpec([3 1]) rlFiniteSetSpec({[1 2],[3 4],[5 6],[7 8],[9 10]}) rlFiniteSetSpec([0 1])]; actInfo = rlNumericSpec([4 1], ... 'UpperLimit',10*ones(4,1), ... 'LowerLimit',-10*ones(4,1));
Create a deep neural network to be used as approximation model within the actor. For a continuous Gaussian actor, the network must have two output layers (one for the mean values the other for the standard deviation values), each having as many elements as the dimension of the action space.
To create a recurrent neural network, use sequenceInputLayer
as the input layer and include an lstmLayer
as one of the other network layers. Also use a softplus layer to enforce nonnegativity of the standard deviations, and a ReLU layer to scale the mean values to the desired output range.
inPath1 = [ sequenceInputLayer(prod(obsInfo(1).Dimension), ... 'Normalization','none','Name','netObsIn1') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc1') ]; inPath2 = [ sequenceInputLayer(prod(obsInfo(2).Dimension), ... 'Normalization','none','Name','netObsIn2') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc2') ]; inPath3 = [ sequenceInputLayer(prod(obsInfo(3).Dimension), ... 'Normalization','none','Name','netObsIn3') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc3') ]; % concatenate inputs along the first available dimension jointPath = [ concatenationLayer(1,3,'Name','cat') tanhLayer('Name','tanhJnt'); lstmLayer(8,'OutputMode','sequence','Name','lstm') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','jntfc'); ]; % path layers for mean value % using scalingLayer to scale range from (-1,1) to (-10,10) meanPath = [ tanhLayer('Name','tanhMean'); fullyConnectedLayer(prod(actInfo.Dimension)); scalingLayer('Name','scale', ... 'Scale',actInfo.UpperLimit) ]; % path layers for standard deviations % using softplus layer to make them non negative sdevPath = [ tanhLayer('Name','tanhStdv'); fullyConnectedLayer(prod(actInfo.Dimension)); softplusLayer('Name','splus') ]; % add layers to network object net = layerGraph; net = addLayers(net,inPath1); net = addLayers(net,inPath2); net = addLayers(net,inPath3); net = addLayers(net,jointPath); net = addLayers(net,meanPath); net = addLayers(net,sdevPath); % connect layers net = connectLayers(net,'infc1','cat/in1'); net = connectLayers(net,'infc2','cat/in2'); net = connectLayers(net,'infc3','cat/in3'); net = connectLayers(net,'jntfc','tanhMean/in'); net = connectLayers(net,'jntfc','tanhStdv/in'); % plot network plot(net)
Create the actor with rlContinuousGaussianActor
, using the network, the observations and action specification objects, as well as the names of the network input layer and the options object.
actor = rlContinuousGaussianActor(net, obsInfo, actInfo, ... 'ActionMeanOutputNames','scale',... 'ActionStandardDeviationOutputNames','splus',... 'ObservationInputNames',{'netObsIn1','netObsIn2','netObsIn3'});
To return mean value and standard deviations of the Gaussian distribution as a function of the current observation, use evaluate
.
[prob,state] = evaluate(actor, {rand([obsInfo(1).Dimension 1 1]) , ... rand([obsInfo(2).Dimension 1 1]) , ... rand([obsInfo(3).Dimension 1 1]) });
The result is a cell array with two elements, the first one containing a vector of mean values, and the second containing a vector of standard deviations.
prob{1}
ans = 4×1 single column vector
0.6371
0.6491
0.5877
-0.1155
prob{2}
ans = 4×1 single column vector
0.7048
0.7032
0.7377
0.7092
To return an action sampled from the distribution, use getAction
.
act = getAction(actor, {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(obsInfo(3).Dimension) }); act{1}
ans = 4×1 single column vector
-0.4660
-0.0028
-0.7374
-0.0750
Calculate the gradients of the sum of the outputs (all the mean values plus all the standard deviations) with respect to the inputs, given a random observation.
gro = gradient(actor,'output-input', ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(obsInfo(3).Dimension)} )
gro=3×1 cell array
{3×1 single}
{2×1 single}
{[ -0.0070]}
The result is a cell array with as many elements as the number of input channels. Each element contains the derivatives of the sum of the outputs with respect to each component of the input channel. Display the gradient with respect to the element of the second channel.
gro{2}
ans = 2×1 single column vector
0.0931
1.8796
Obtain the gradient with respect of 5 independent sequences each one consisting of 9
sequential observations.
gro_batch = gradient(actor,'output-input', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) , ... rand([obsInfo(3).Dimension 5 9])} )
gro_batch=3×1 cell array
{3×5×9 single}
{2×5×9 single}
{1×5×9 single}
Display the derivative of the sum of the outputs with respect to the third observation element of the first input channel, after the seventh sequential observation in the fourth independent batch.
gro_batch{1}(3,4,7)
ans = single
-0.3318
Set the option to accelerate the gradient computations.
actor = accelerate(actor,true);
Calculate the gradients of the sum of the outputs with respect to the parameters, given a random observation.
grp = gradient(actor,'output-parameters', ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(obsInfo(3).Dimension)} )
grp=15×1 cell array
{ 4×3 single}
{ 4×1 single}
{ 4×2 single}
{ 4×1 single}
{ 4×1 single}
{ 4×1 single}
{32×12 single}
{32×8 single}
{32×1 single}
{ 4×8 single}
{ 4×1 single}
{ 4×4 single}
{ 4×1 single}
{ 4×4 single}
{ 4×1 single}
Each array within a cell contains the gradient of the sum of the outputs with respect to a group of parameters.
grp_batch = gradient(actor,'output-parameters', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) , ... rand([obsInfo(3).Dimension 5 9])} )
grp_batch=15×1 cell array
{ 4×3 single}
{ 4×1 single}
{ 4×2 single}
{ 4×1 single}
{ 4×1 single}
{ 4×1 single}
{32×12 single}
{32×8 single}
{32×1 single}
{ 4×8 single}
{ 4×1 single}
{ 4×4 single}
{ 4×1 single}
{ 4×4 single}
{ 4×1 single}
If you use a batch of inputs, the gradient is calculated considering the whole input sequence (in this case 9 steps), and all the gradients with respect to the independent batch dimensions (in this case 5) are added together. Therefore, the returned gradient has always the same size as the output from getLearnableParameters
.
Calculate Gradients for a Vector Q-Value Function
Create observation and action specification objects (or alternatively use getObservationInfo
and getActionInfo
to extract the specification objects from an environment). For this example, define the observation space as consisting of two channels. The first channel carries an observation from a continuous four-dimensional space. The second carries a discrete scalar observation that can be either zero or one. Finally, the action space consist of a scalar that can be -1
, 0
or 1
.
obsInfo = [rlNumericSpec([4 1]) rlFiniteSetSpec([0 1])]; actInfo = rlFiniteSetSpec([-1 0 1]);
Create a deep neural network to be used as approximation model within the critic. The output layer must have three elements, each one expressing the value of executing the corresponding action, given the observation. To create a recurrent neural network, use sequenceInputLayer
as the input layer and include an lstmLayer
as one of the other network layers.
inPath1 = [ sequenceInputLayer(prod(obsInfo(1).Dimension), ... 'Normalization','none','Name','netObsIn1') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc1') ]; inPath2 = [ sequenceInputLayer(prod(obsInfo(2).Dimension), ... 'Normalization','none','Name','netObsIn2') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc2') ]; % concatenate inputs along first available dimension jointPath = [ concatenationLayer(1,2,'Name','cct') tanhLayer('Name','tanhJnt'); lstmLayer(8,'OutputMode','sequence','Name','lstm'); fullyConnectedLayer(prod(numel(actInfo.Elements)), ... 'Name','jntfc'); ]; % add layers to network object net = layerGraph; net = addLayers(net,inPath1); net = addLayers(net,inPath2); net = addLayers(net,jointPath); % connect layers net = connectLayers(net,'infc1','cct/in1'); net = connectLayers(net,'infc2','cct/in2'); % plot network plot(net)
Create the critic with rlVectorQValueFunction
, using the network, the observations and action specification objects.
critic = rlVectorQValueFunction(net,obsInfo,actInfo);
To return the value of the actions a function of the current observation, use getValue
or evaluate
.
val = evaluate(critic, ... {rand(obsInfo(1).Dimension), ... rand(obsInfo(2).Dimension)})
val = 1x1 cell array
{3x1 single}
When using evaluate
, the result it a single-element cell array, containing a vector with the values of all the possible actions, given the observation.
val{1}
ans = 3x1 single column vector
-0.0054
-0.0943
0.0177
Calculate the gradients of the sum of the outputs with respect to the inputs, given a random observation.
gro = gradient(critic,'output-input', ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) } )
gro=2×1 cell array
{4x1 single}
{[ -0.0396]}
The result is a cell array with as many elements as the number of input channels. Each element contains the derivative of the sum of the outputs with respect to each component of the input channel. Display the gradient with respect to the element of the second channel.
gro{2}
ans = single
-0.0396
Obtain the gradient with respect of 5 independent sequences each one consisting of 9
sequential observations.
gro_batch = gradient(critic,'output-input', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) } )
gro_batch=2×1 cell array
{4x5x9 single}
{1x5x9 single}
Display the derivative of the sum of the outputs with respect to the third observation element of the first input channel, after the seventh sequential observation in the fourth independent batch.
gro_batch{1}(3,4,7)
ans = single
0.0443
Set the option to accelerate the gradient computations.
critic = accelerate(critic,true);
Calculate the gradients of the sum of the outputs with respect to the parameters, given a random observation.
grp = gradient(critic,'output-parameters', ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) } )
grp=9×1 cell array
{[-0.0101 -0.0097 -0.0039 -0.0065]}
{[ -0.0122]}
{[ -0.0078]}
{[ -0.0863]}
{32x2 single }
{32x8 single }
{32x1 single }
{ 3x8 single }
{ 3x1 single }
Each array within a cell contains the gradient of the sum of the outputs with respect to a group of parameters.
grp_batch = gradient(critic,'output-parameters', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) } )
grp_batch=9×1 cell array
{[-1.5801 -1.2618 -2.2168 -1.3463]}
{[ -3.7742]}
{[ -6.2158]}
{[ -12.6841]}
{32x2 single }
{32x8 single }
{32x1 single }
{ 3x8 single }
{ 3x1 single }
If you use a batch of inputs, the gradient is calculated considering the whole input sequence (in this case 9 steps), and all the gradients with respect to the independent batch dimensions (in this case 5) are added together. Therefore, the returned gradient has always the same size as the output from getLearnableParameters
.
Input Arguments
fcnAppx
— Function approximator object
rlValueFunction
object | rlQValueFunction
object | rlVectorQValueFunction
object | rlDiscreteCategoricalActor
object | rlContinuousDeterministicActor
object | rlContinuousGaussianActor
object | rlContinuousDeterministicTransitionFunction
object | rlContinuousGaussianTransitionFunction
object | rlContinuousDeterministicRewardFunction
object | rlContinuousGaussianRewardFunction
object | rlIsDoneFunction
object
Function approximator object, specified as an:
rlValueFunction
object,rlQValueFunction
object,rlVectorQValueFunction
object,rlDiscreteCategoricalActor
object,rlContinuousDeterministicActor
object,rlContinuousGaussianActor
object,rlIsDoneFunction
object.
inData
— Input data for the function approximator
cell array
Input data for the function approximator, specified as a cell array with as many
elements as the number of input channels of fcnAppx
. In the
following section, the number of observation channels is indicated by
NO.
If
fcnAppx
is anrlQValueFunction
, anrlContinuousDeterministicTransitionFunction
or anrlContinuousGaussianTransitionFunction
object, then each of the first NO elements ofinData
must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a final matrix representing the action.If
fcnAppx
is a function approximator object representing an actor or critic (but not anrlQValueFunction
object),inData
must contain NO elements, each one being a matrix representing the current observation from the corresponding observation channel.If
fcnAppx
is anrlContinuousDeterministicRewardFunction
, anrlContinuousGaussianRewardFunction
, or anrlIsDoneFunction
object, then each of the first NO elements ofinData
must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a matrix representing the action, and finally by NO elements, each one being a matrix representing the next observation from the corresponding observation channel.
Each element of inData
must be a matrix of dimension
MC-by-LB-by-LS,
where:
MC corresponds to the dimensions of the associated input channel.
LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of (independent) observations, specify LB > 1. If
inData
has multiple elements, then LB must be the same for all elements ofinData
.LS specifies the sequence length (along the time dimension) for recurrent neural network. If
fcnAppx
does not use a recurrent neural network, (which is the case of environment function approximators, as they do not support recurrent neural networks) then LS = 1. IfinData
has multiple elements, then LS must be the same for all elements ofinData
.
For more information on input and output formats for recurrent neural networks, see
the Algorithms section of lstmLayer
.
Example: {rand(8,3,64,1),rand(4,1,64,1),rand(2,1,64,1)}
lossFcn
— Loss function
function handle
Loss function, specified as a function handle to a user-defined function. The user defined function can either be an anonymous function or a function on the MATLAB path. For an example of the signature that this function must have, see the example Train Reinforcement Learning Policy Using Custom Training Loop.
fcnData
— Additional input data for the loss function
array
Additional data for the loss function. For an example see Train Reinforcement Learning Policy Using Custom Training Loop.
Output Arguments
grad
— Value of the gradient
cell array
Value of the gradient, returned in a cell array.
When the type of gradient is from the sum of the outputs with respect to the inputs
of fcnAppx
, then grad
is a cell array in which
each element contains the gradient of the sum of all the outputs with respect to the
corresponding input channel.
The numerical array in each cell has dimensions D-by-LB-by-LS, where:
D corresponds to the dimensions of the input channel of
fcnAppx
.LB is the batch size, (length of a batch of independent inputs).
LS is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If
fcnAppx
does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then LS = 1.
When the type of gradient is from the output with respect to the parameters of
fcnAppx
, then grad
is a cell array in which
each element contains the gradient of the sum of outputs belonging to an output channel
with respect to the corresponding group of parameters. The gradient is calculated
considering the whole history of LS inputs,
and all the LB gradients with respect to the
independent input sequences are added together in grad
. Therefore,
grad
has always the same size as the result from getLearnableParameters
.
For more information on input and output formats for recurrent neural networks, see
the Algorithms section of lstmLayer
.
Version History
See Also
evaluate
| accelerate
| getLearnableParameters
| rlValueFunction
| rlQValueFunction
| rlVectorQValueFunction
| rlContinuousDeterministicActor
| rlDiscreteCategoricalActor
| rlContinuousGaussianActor
| rlContinuousDeterministicTransitionFunction
| rlContinuousGaussianTransitionFunction
| rlContinuousDeterministicRewardFunction
| rlContinuousGaussianRewardFunction
| rlIsDoneFunction
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)