Main Content

rlDeterministicActorRepresentation

Deterministic actor representation for reinforcement learning agents

Description

This object implements a function approximator to be used as a deterministic actor within a reinforcement learning agent with a continuous action space. A deterministic actor takes observations as inputs and returns as outputs the action that maximizes the expected cumulative long-term reward, thereby implementing a deterministic policy. After you create an rlDeterministicActorRepresentation object, use it to create a suitable agent, such as an rlDDPGAgent agent. For more information on creating representations, see Create Policy and Value Function Representations.

Creation

Description

example

actor = rlDeterministicActorRepresentation(net,observationInfo,actionInfo,'Observation',obsName,'Action',actName) creates a deterministic actor using the deep neural network net as approximator. This syntax sets the ObservationInfo and ActionInfo properties of actor to the inputs observationInfo and actionInfo, containing the specifications for observations and actions, respectively. observationInfo must specify a continuous action space, discrete action spaces are not supported. obsName must contain the names of the input layers of net that are associated with the observation specifications. The action names actName must be the names of the output layers of net that are associated with the action specifications.

example

actor = rlDeterministicActorRepresentation({basisFcn,W0},observationInfo,actionInfo) creates a deterministic actor using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn to a custom basis function, and the second element contains the initial weight matrix W0. This syntax sets the ObservationInfo and ActionInfo properties of actor respectively to the inputs observationInfo and actionInfo.

actor = rlDeterministicActorRepresentation(___,options) creates a deterministic actor using the additional options set options, which is an rlRepresentationOptions object. This syntax sets the Options property of actor to theoptions input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximator within the actor, specified as one of the following:

The network input layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo. Also, the names of these input layers must match the observation names listed in obsName.

The network output layer must have the same data type and dimension as the signal defined in ActionInfo. Its name must be the action name specified in actName.

rlDeterministicActorRepresentation objects support recurrent deep neural networks.

For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policy and Value Function Representations.

Observation names, specified as a cell array of strings or character vectors. The observation names must be the names of the input layers in net.

Example: {'my_obs'}

Action name, specified as a single-element cell array that contains a character vector. It must be the name of the output layer of net.

Example: {'my_act'}

Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The action to be taken based on the current observation, which is the output of the actor, is the vector a = W'*B, where W is a weight matrix containing the learnable parameters and B is the column vector returned by the custom basis function.

When creating a deterministic actor representation, your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here obs1 to obsN are observations in the same order and with the same data type and dimensions as the signals defined in observationInfo

Example: @(obs1,obs2,obs3) [obs3(2)*obs1(1)^2; abs(obs2(5)+obs3(1))]

Initial value of the basis function weights, W, specified as a matrix having as many rows as the length of the vector returned by the basis function and as many columns as the dimension of the action space.

Properties

expand all

Representation options, specified as an rlRepresentationOptions object. Available options include the optimizer used for training and the learning rate.

Observation specifications, a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the observation signals.

rlDeterministicActorRepresentation sets the ObservationInfo property of actor to the input observationInfo.

You can extract observationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

Action specifications for a continuous action space, a rlNumericSpec object defining properties such as dimensions, data type and name of the action signals. The deterministic actor representation does not support discrete actions.

rlDeterministicActorRepresentation sets the ActionInfo property of actor to the input observationInfo.

You can extract actionInfo from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlNumericSpec.

For custom basis function representations, the action signal must be a scalar, a column vector, or a discrete action.

Object Functions

rlDDPGAgentDeep deterministic policy gradient reinforcement learning agent
rlTD3AgentTwin-delayed deep deterministic policy gradient reinforcement learning agent
getActionObtain action from agent or actor representation given environment observations

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing two doubles.

actInfo = rlNumericSpec([2 1]);

Create a deep neural network approximator for the actor. The input of the network (here called myobs) must accept a four-element vector (the observation vector just defined by obsInfo), and its output must be the action (here called myact) and be a two-element vector, as defined by actInfo.

net = [featureInputLayer(4,'Normalization','none','Name','myobs') 
    fullyConnectedLayer(2,'Name','myact')];

Create the critic with rlQValueRepresentation, using the network, the observations and action specification objects, as well as the names of the network input and output layers.

actor = rlDeterministicActorRepresentation(net,obsInfo,actInfo, ...
    'Observation',{'myobs'},'Action',{'myact'})
actor = 
  rlDeterministicActorRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlNumericSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your actor, use getAction to return the action from a random observation, using the current network weights.

act = getAction(actor,{rand(4,1)}); act{1}
ans = 2x1 single column vector

   -0.5054
    1.5390

You can now use the actor to create a suitable agent (such as an rlACAgent, rlPGAgent, or rlDDPGAgent agent).

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 3 doubles.

obsInfo = rlNumericSpec([3 1]);

The deterministic actor does not support discrete action spaces. Therefore, create a continuous action space specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing 2 doubles.

actInfo = rlNumericSpec([2 1]);

Create a custom basis function. Each element is a function of the observations defined by obsInfo.

myBasisFcn = @(myobs) [myobs(2)^2; myobs(1); 2*myobs(2)+myobs(1); -myobs(3)]
myBasisFcn = function_handle with value:
    @(myobs)[myobs(2)^2;myobs(1);2*myobs(2)+myobs(1);-myobs(3)]

The output of the actor is the vector W'*myBasisFcn(myobs), which is the action taken as a result of the given observation. The weight matrix W contains the learnable parameters and must have as many rows as the length of the basis function output and as many columns as the dimension of the action space.

Define an initial parameter matrix.

W0 = rand(4,2);

Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial weight matrix. The second and third arguments are, respectively, the observation and action specification objects.

actor = rlDeterministicActorRepresentation({myBasisFcn,W0},obsInfo,actInfo)
actor = 
  rlDeterministicActorRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlNumericSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your actor, use the getAction function to return the action from a given observation, using the current parameter matrix.

a = getAction(actor,{[1 2 3]'});
a{1}
ans = 
  2x1 dlarray

    2.0595
    2.3788

You can now use the actor (along with an critic) to create a suitable continuous action space agent.

Create observation and action information. You can also obtain these specifications from an environment.

obsinfo = rlNumericSpec([4 1]);
actinfo = rlNumericSpec([2 1]);
numObs = obsinfo.Dimension(1);
numAct = actinfo.Dimension(1);

Create a recurrent deep neural network for the actor. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include at least one lstmLayer.

net = [sequenceInputLayer(numObs,'Normalization','none','Name','state')
            fullyConnectedLayer(10,'Name','fc1')
            reluLayer('Name','relu1')
            lstmLayer(8,'OutputMode','sequence','Name','ActorLSTM')
            fullyConnectedLayer(20,'Name','CriticStateFC2')
            fullyConnectedLayer(numAct,'Name','action')
            tanhLayer('Name','tanh1')];

Create a deterministic actor representation for the network.

actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(net,obsinfo,actinfo,...
    'Observation',{'state'},'Action',{'tanh1'});
Introduced in R2020a