update

Update the state of on optimizer object and a set of learnable parameters using the gradient value

Since R2022a

Syntax

[newFcnAppx,newOptimizer] = update(optimizer,fcnAppx,grad)

[newPars,newOptimizer] = update(optimizer,params,grad)

Description

[newFcnAppx,newOptimizer] = update(optimizer,fcnAppx,grad) updates the internal state of optimizer and the learnable parameters of fcnAppx according to the gradient value grad. Returns the updated approximator newFcnAppx and the updated optimizer newOptimizer.

example

[newPars,newOptimizer] = update(optimizer,params,grad) updates the internal state of optimizer and the set of parameters params according to the gradient value grad. Returns the updated parameter set newPars and the updated optimizer newOptimizer.

example

Examples

collapse all

Update Function Approximator and Optimizer State Using Gradient Value

Open Live Script

For this example, create a value function critic and update its parameters.

First, create an finite set observation specification for a scalar that can have four different values.

obsInfo = rlFiniteSetSpec(1:4);

Create a table object. Table values are initialized to zero by default.

table = rlTable(obsInfo);

Set the table values to different values.

table.Table = [1 -1 -10 100]';

Create the critic.

critic = rlValueFunction(table,obsInfo);

Create an optimizer object.

opt = rlOptimizer(rlOptimizerOptions(Algorithm="sgdm",LearnRate=0.2))

opt = 
  rlSGDMOptimizer with properties:

                   Momentum: 0.9000
                  LearnRate: 0.2000
     L2RegularizationFactor: 1.0000e-04
          GradientThreshold: Inf
    GradientThresholdMethod: "l2norm"

For this example, assume a gradient value for the set of parameters equal to {dlarray([0.1 0.2 0.3 0.4]')}.

Update the parameter set, and display the updated optimizer and parameter set.

[newCritic,newOpt] = update(opt,critic,{dlarray([0.1 0.2 0.3 0.4]')})

newCritic = 
  rlValueFunction with properties:

    ObservationInfo: [1x1 rl.util.rlFiniteSetSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {[4x1 dlarray]}
              State: {}

newOpt = 
  rlSGDMOptimizer with properties:

                   Momentum: 0.9000
                  LearnRate: 0.2000
     L2RegularizationFactor: 1.0000e-04
          GradientThreshold: Inf
    GradientThresholdMethod: "l2norm"

Display the learnable parameters of the updated critic.

newCritic.Learnables{1}

ans = 
  4x1 dlarray

    0.9800
   -1.0400
  -10.0598
   99.9180

You can subsequently update both the critic and the optimizer object. For an example on how using update in a custom training loop, see Train Reinforcement Learning Policy Using Custom Training Loop and Create and Train Custom PG Agent.

Update Learnable Parameter Set and Optimizer State Using Gradient Value

Open Live Script

Create a default optimizer object.

opt = rlOptimizer

opt = 
  rlADAMOptimizer with properties:

           GradientDecayFactor: 0.9000
    SquaredGradientDecayFactor: 0.9990
                       Epsilon: 1.0000e-08
                     LearnRate: 0.0100
        L2RegularizationFactor: 1.0000e-04
             GradientThreshold: Inf
       GradientThresholdMethod: "l2norm"

For this example, assume a parameter set given by the two-element cell array {1 -1}, and a gradient value of {0.1 0.1}.

Update the parameter set, and display the updated optimizer and parameter set.

[pars,opt] = update(opt,{1 -1},{0.1 0.1})

pars=1×2 cell array
    {[0.9900]}    {[-1.0100]}

opt = 
  rlADAMOptimizer with properties:

           GradientDecayFactor: 0.9000
    SquaredGradientDecayFactor: 0.9990
                       Epsilon: 1.0000e-08
                     LearnRate: 0.0100
        L2RegularizationFactor: 1.0000e-04
             GradientThreshold: Inf
       GradientThresholdMethod: "l2norm"

You can subsequently update both the parameter set and the optimizer object. For an example on how using update in a custom training loop, see Train Reinforcement Learning Policy Using Custom Training Loop and Create and Train Custom PG Agent.

Input Arguments

collapse all

`optimizer` — Optimizer object to update
`rlADAMOptimizer` object | `rlSGDMOptimizer` object | `rlRMSPropOptimizer` object

Optimizer object to update, specified as an rlADAMOptimizer, rlSGDMOptimizer, or rlRMSPropOptimizer object. The runEpisode function uses the update method of the returned object to update the learnable parameter of an actor or critic.

`fcnAppx` — Function approximator object to update
function approximator object

Function approximator object to update, specified as one of the following:

rlValueFunction object — Value function critic
rlQValueFunction object — Q-value function critic
rlVectorQValueFunction object — Multi-output Q-value function critic with a discrete action space
rlContinuousDeterministicActor object — Deterministic policy actor with a continuous action space
rlDiscreteCategoricalActor — Stochastic policy actor with a discrete action space
rlContinuousGaussianActor object — Stochastic policy actor with a continuous action space
rlContinuousDeterministicTransitionFunction object — Continuous deterministic transition function for a model based agent
rlContinuousGaussianTransitionFunction object — Continuous Gaussian transition function for a model based agent
rlContinuousDeterministicRewardFunction object — Continuous deterministic reward function for a model based agent
rlContinuousGaussianRewardFunction object — Continuous Gaussian reward function for a model based agent
rlIsDoneFunction object — Is-done function for a model based agent

`params` — Parameter set to update
cell array

Parameter set to update, specified as a cell array.

Example: {1 2 -1 -3}

`grad` — Value of the gradient
cell array

Value of the gradient, returned as a cell array with elements consistent in size and data type with the learnable parameters of fcnAppx or with params.

Specifically, each element of grad contains the gradient of a loss function with respect to a group of learnable parameters of fcnAppx.

The numerical array in each cell has dimensions D-by-L_B-by-L_S, where:

D corresponds to the dimensions of the input channel of fcnAppx.
L_B is the batch size (length of a batch of independent inputs).
L_S is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If fcnAppx does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then L_S = 1.

The gradient is calculated using the whole history of L_S inputs, and all the L_B gradients with respect to the independent input sequences are added together in grad. Therefore, grad has always the same size as the output of getLearnableParameters.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Example: {0.2 -0.1 0 -0.01}

Output Arguments

collapse all

`newFcnAppx` — Updated function approximator object
function approximator object

Updated function approximator object, returned as a function approximator object of the same type and configuration as fcnAppx.

`newOptimizer` — Updated optimizer object
`rlADAMOptimizer` object | `rlSGDMOptimizer` object | `rlRMSPropOptimizer` object

Optimizer object to update, specified as an rlADAMOptimizer, rlSGDMOptimizer, or rlRMSPropOptimizer object. The object implements an optimization algorithms.

`newPars` — Updated parameter set
cell array

Updated parameter set, returned as a cell array of the same dimension and type as params.

Version History

Introduced in R2022a

update

Syntax

Description

Examples

Update Function Approximator and Optimizer State Using Gradient Value

Update Learnable Parameter Set and Optimizer State Using Gradient Value

Input Arguments

`optimizer` — Optimizer object to update
`rlADAMOptimizer` object | `rlSGDMOptimizer` object | `rlRMSPropOptimizer` object

`fcnAppx` — Function approximator object to update
function approximator object

`params` — Parameter set to update
cell array

`grad` — Value of the gradient
cell array

Output Arguments

`newFcnAppx` — Updated function approximator object
function approximator object

`newOptimizer` — Updated optimizer object
`rlADAMOptimizer` object | `rlSGDMOptimizer` object | `rlRMSPropOptimizer` object

`newPars` — Updated parameter set
cell array

Version History

See Also

Functions

Objects

Topics

update

Syntax

Description

Examples

Update Function Approximator and Optimizer State Using Gradient Value

Update Learnable Parameter Set and Optimizer State Using Gradient Value

Input Arguments

optimizer — Optimizer object to update rlADAMOptimizer object | rlSGDMOptimizer object | rlRMSPropOptimizer object

fcnAppx — Function approximator object to update function approximator object

params — Parameter set to update cell array

grad — Value of the gradient cell array

Output Arguments

newFcnAppx — Updated function approximator object function approximator object

newOptimizer — Updated optimizer object rlADAMOptimizer object | rlSGDMOptimizer object | rlRMSPropOptimizer object

newPars — Updated parameter set cell array

Version History

See Also

Functions

Objects

Topics

`optimizer` — Optimizer object to update
`rlADAMOptimizer` object | `rlSGDMOptimizer` object | `rlRMSPropOptimizer` object

`fcnAppx` — Function approximator object to update
function approximator object

`params` — Parameter set to update
cell array

`grad` — Value of the gradient
cell array

`newFcnAppx` — Updated function approximator object
function approximator object

`newOptimizer` — Updated optimizer object
`rlADAMOptimizer` object | `rlSGDMOptimizer` object | `rlRMSPropOptimizer` object

`newPars` — Updated parameter set
cell array