Main Content

update

Update the state of on optimizer object and a set of learnable parameters using the gradient value

Since R2022a

    Description

    [newFcnAppx,newOptimizer] = update(optimizer,fcnAppx,grad) updates the internal state of optimizer and the learnable parameters of fcnAppx according to the gradient value grad. Returns the updated approximator newFcnAppx and the updated optimizer newOptimizer.

    example

    [newPars,newOptimizer] = update(optimizer,params,grad) updates the internal state of optimizer and the set of parameters params according to the gradient value grad. Returns the updated parameter set newPars and the updated optimizer newOptimizer.

    example

    Examples

    collapse all

    For this example, create a value function critic and update its parameters.

    First, create an finite set observation specification for a scalar that can have four different values.

    obsInfo = rlFiniteSetSpec(1:4);

    Create a table object. Table values are initialized to zero by default.

    table = rlTable(obsInfo);

    Set the table values to different values.

    table.Table = [1 -1 -10 100]';

    Create the critic.

    critic = rlValueFunction(table,obsInfo);

    Create an optimizer object.

    opt = rlOptimizer(rlOptimizerOptions(Algorithm="sgdm",LearnRate=0.2))
    opt = 
      rlSGDMOptimizer with properties:
    
                       Momentum: 0.9000
                      LearnRate: 0.2000
         L2RegularizationFactor: 1.0000e-04
              GradientThreshold: Inf
        GradientThresholdMethod: "l2norm"
    
    

    For this example, assume a gradient value for the set of parameters equal to {dlarray([0.1 0.2 0.3 0.4]')}.

    Update the parameter set, and display the updated optimizer and parameter set.

    [newCritic,newOpt] = update(opt,critic,{dlarray([0.1 0.2 0.3 0.4]')})
    newCritic = 
      rlValueFunction with properties:
    
        ObservationInfo: [1x1 rl.util.rlFiniteSetSpec]
          Normalization: "none"
              UseDevice: "cpu"
             Learnables: {[4x1 dlarray]}
                  State: {}
    
    
    newOpt = 
      rlSGDMOptimizer with properties:
    
                       Momentum: 0.9000
                      LearnRate: 0.2000
         L2RegularizationFactor: 1.0000e-04
              GradientThreshold: Inf
        GradientThresholdMethod: "l2norm"
    
    

    Display the learnable parameters of the updated critic.

    newCritic.Learnables{1}
    ans = 
      4x1 dlarray
    
        0.9800
       -1.0400
      -10.0598
       99.9180
    
    

    You can subsequently update both the critic and the optimizer object. For an example on how using update in a custom training loop, see Train Reinforcement Learning Policy Using Custom Training Loop and Create and Train Custom PG Agent.

    Create a default optimizer object.

    opt = rlOptimizer
    opt = 
      rlADAMOptimizer with properties:
    
               GradientDecayFactor: 0.9000
        SquaredGradientDecayFactor: 0.9990
                           Epsilon: 1.0000e-08
                         LearnRate: 0.0100
            L2RegularizationFactor: 1.0000e-04
                 GradientThreshold: Inf
           GradientThresholdMethod: "l2norm"
    
    

    For this example, assume a parameter set given by the two-element cell array {1 -1}, and a gradient value of {0.1 0.1}.

    Update the parameter set, and display the updated optimizer and parameter set.

    [pars,opt] = update(opt,{1 -1},{0.1 0.1})
    pars=1×2 cell array
        {[0.9900]}    {[-1.0100]}
    
    
    opt = 
      rlADAMOptimizer with properties:
    
               GradientDecayFactor: 0.9000
        SquaredGradientDecayFactor: 0.9990
                           Epsilon: 1.0000e-08
                         LearnRate: 0.0100
            L2RegularizationFactor: 1.0000e-04
                 GradientThreshold: Inf
           GradientThresholdMethod: "l2norm"
    
    

    You can subsequently update both the parameter set and the optimizer object. For an example on how using update in a custom training loop, see Train Reinforcement Learning Policy Using Custom Training Loop and Create and Train Custom PG Agent.

    Input Arguments

    collapse all

    Optimizer object to update, specified as an rlADAMOptimizer, rlSGDMOptimizer, or rlRMSPropOptimizer object. The runEpisode function uses the update method of the returned object to update the learnable parameter of an actor or critic.

    Function approximator object to update, specified as one of the following:

    Parameter set to update, specified as a cell array.

    Example: {1 2 -1 -3}

    Value of the gradient, returned as a cell array with elements consistent in size and data type with the learnable parameters of fcnAppx or with params.

    Specifically, each element of grad contains the gradient of a loss function with respect to a group of learnable parameters of fcnAppx.

    The numerical array in each cell has dimensions D-by-LB-by-LS, where:

    • D corresponds to the dimensions of the input channel of fcnAppx.

    • LB is the batch size (length of a batch of independent inputs).

    • LS is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If fcnAppx does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then LS = 1.

    The gradient is calculated using the whole history of LS inputs, and all the LB gradients with respect to the independent input sequences are added together in grad. Therefore, grad has always the same size as the output of getLearnableParameters.

    For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

    Example: {0.2 -0.1 0 -0.01}

    Output Arguments

    collapse all

    Updated function approximator object, returned as a function approximator object of the same type and configuration as fcnAppx.

    Optimizer object to update, specified as an rlADAMOptimizer, rlSGDMOptimizer, or rlRMSPropOptimizer object. The object implements an optimization algorithms.

    Updated parameter set, returned as a cell array of the same dimension and type as params.

    Version History

    Introduced in R2022a