Main Content

rlContinuousGaussianActor

Stochastic Gaussian actor with a continuous action space for reinforcement learning agents

Description

This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent with a continuous action space. A continuous Gaussian actor takes an environment state as input and returns as output a random action sampled from a Gaussian probability distribution of the expected cumulative long term reward, thereby implementing a stochastic policy. After you create an rlContinuousGaussianActor object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating representations, see Create Policies and Value Functions.

Creation

Description

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=netMeanActName,ActionStandardDeviationOutputNames=netStdvActName) creates a Gaussian stochastic actor with a continuous action space using the deep neural network net as function approximator. Here, net must have two differently named output layers, each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers calculate the mean and standard deviation of each component of the action. The actor uses these layers, according to the names specified in the strings netMeanActName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled. The function sets the ObservationInfo and ActionInfo properties of actor to the input arguments observationInfo and actionInfo, respectively.

Note

actor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.

example

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=netMeanActName,ActionStandardDeviationOutputNames=netStdActName,ObservationInputNames=netObsNames) specifies the names of the network input layers to be associated with the environment observation channels. The function assigns, in sequential order, each environment observation channel specified in observationInfo to the layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

actor = rlContinuousGaussianActor(___,UseDevice=useDevice) specifies the device used to perform computational operations on the actor object, and sets the UseDevice property of actor to the useDevice input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximator within the actor. The network must have two differently named output layers each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers calculate the mean and standard deviation of each component of the action. The actor uses these layers, according to the names specified in the strings netMeanActName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled.

Note

Standard deviations must be nonnegative and mean values must fall within the range of the action. Therefore, the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity, and the output layer that returns the mean values must be a scaling layer, to scale the mean values to the output range.

You can specify the network as one of the following:

Note

Among the different network representation options, dlnetwork is preferred, since it has built-in validation checks and supports automatic differentiation. If you pass another network object as an input argument, it is internally converted to a dlnetwork object. However, best practice is to convert other representations to dlnetwork explicitly before using it to create a critic or an actor for a reinforcement learning agent. You can do so using dlnet=dlnetwork(net), where net is any neural network object from the Deep Learning Toolbox™. The resulting dlnet is the dlnetwork object that you use for your critic or actor. This practice allows a greater level of insight and control for cases in which the conversion is not straightforward and might require additional specifications.

rlContinuousGaussianActor objects support recurrent deep neural networks.

The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

Names of the network output layers corresponding to the mean values of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the mean values of each elements of the action channel. Therefore, this network output layer must be named as indicated in netMeanActName. Furthermore, it must be a scaling layer that scales the returned mean values to the desired action range.

Note

Of the information specified in actionInfo, the function uses only the data type and dimension of each channel, but not its (optional) name or description.

Example: "myNetOut_Force_Mean_Values"

Names of the network output layers corresponding to the standard deviations of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the standard deviations of each elements of the action channel. Therefore, this network output layer must be named as indicated in netStdvActName. Furthermore, it must be a softplus or ReLU layer, to enforce nonnegativity of the returned standard deviations.

Note

Of the information specified in actionInfo, the function uses only the data type and dimension of each channel, but not its (optional) name or description.

Example: "myNetOut_Force_Standard_Deviations"

Network input layers names corresponding to the environment observation channels, specified as a string array or a cell array of character vectors. When you use this argument after 'ObservationInputNames', the function assigns, in sequential order, each environment observation channel specified in observationInfo to each network input layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

Note

Of the information specified in observationInfo, the function uses only the data type and dimension of each channel, but not its (optional) name or description.

Example: {"NetInput1_airspeed","NetInput2_altitude"}

Properties

expand all

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array of such objects. These objects define properties such as the dimensions, data types, and names of the observation signals.

rlContinuousGaussianActor sets the ObservationInfo property of actor to the input observationInfo.

You can extract ObservationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually.

Action specifications, specified as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name. Note that the function does not use the name of the action channel specified in actionInfo.

Note

Only one action channel is allowed.

rlContinuousGaussianActor sets the ActionInfo property of critic to the input actionInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specifications manually.

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either "cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations a CPU.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: 'UseDevice',"gpu"

Object Functions

rlACAgentActor-critic reinforcement learning agent
rlPGAgentPolicy gradient reinforcement learning agent
rlPPOAgentProximal policy optimization reinforcement learning agent
rlSACAgentSoft actor-critic reinforcement learning agent
getActionObtain action from agent, actor, or policy object given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data
gradientEvaluate gradient of function approximator object given observation and action input data
accelerateOption to accelerate computation of gradient for approximator object based on neural network
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
setModelSet function approximation model for actor or critic
getModelGet function approximator model from actor or critic

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that a single observation is a column vector containing five doubles.

obsInfo = rlNumericSpec([5 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous three-dimensional space, so that a single action is a column vector containing three doubles, each between -10 and 10.

actInfo = rlNumericSpec([3 1], ...
    LowerLimit=-10, ...
    UpperLimit=10);

To approximate the policy within the actor, use a deep neural network.

For a continuous Gaussian actor, the network must take the observation signal as input and return both a mean value and a standard deviation value for each action. Therefore it must have two output layers (one for the mean values the other for the standard deviation values), each having as many elements as the dimension of the action space. You can obtain the dimensions of the observation and action spaces from the environment specification objects (for example regardless of whether the observation space is a column vector, row vector, or matrix, prod(obsInfo.Dimension)).

Note that standard deviations must be nonnegative and mean values must fall within the range of the action. Therefore the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity, while the output layer that returns the mean values must be a scaling layer, to scale the mean values to the output range.

Create each network path as an array of layer objects. Specify a name for the input and output layers, so you can later explicitly associate them with the correct channels.

% Input path layers
inPath = [ 
    featureInputLayer( ...
        prod(obsInfo.Dimension), ...
        Name="netOin")
    fullyConnectedLayer( ...
        prod(actInfo.Dimension), ...
        Name="infc") 
    ];

% Path layers for mean value 
% Using scalingLayer to scale range from (-1,1) to (-10,10)
meanPath = [ 
    tanhLayer(Name="tanhMean");
    fullyConnectedLayer(prod(actInfo.Dimension));
    scalingLayer(Name="scale", ...
    Scale=actInfo.UpperLimit) 
    ];

% Path layers for standard deviations
% Using softplus layer to make them non negative
sdevPath = [ 
    tanhLayer(Name="tanhStdv");
    fullyConnectedLayer(prod(actInfo.Dimension));
    softplusLayer(Name="splus") 
    ];

% Add layers to network object
net = layerGraph(inPath);
net = addLayers(net,meanPath);
net = addLayers(net,sdevPath);

% Connect layers
net = connectLayers(net,"infc","tanhMean/in");
net = connectLayers(net,"infc","tanhStdv/in");

% Plot the network
plot(net)

Figure contains an axes object. The axes object contains an object of type graphplot.

Convert the network to a dlnetwork object and display the number of learnable parameters (weights).

net = dlnetwork(net);
summary(net)
   Initialized: true

   Number of learnables: 42

   Inputs:
      1   'netOin'   5 features

Create the actor with rlContinuousGaussianActor, using the network, the observation and action specification objects, and the names of the network input and output layers.

actor = rlContinuousGaussianActor(net, obsInfo, actInfo, ...
    ActionMeanOutputNames="scale",...
    ActionStandardDeviationOutputNames="splus",...
    ObservationInputNames="netOin");

To check your actor, use getAction to return an action from a random observation vector, using the current network weights. Each of the three elements of the action vector is a random sample from the Gaussian distribution with mean and standard deviation calculated, as a function of the current observation, by the neural network.

act = getAction(actor,{rand(obsInfo.Dimension)}); 
act{1}
ans = 3x1 single column vector

  -12.0285
    1.7628
   10.8733

To return the Gaussian distribution of the action, given an observation, use evaluate.

dist = evaluate(actor,{rand(obsInfo.Dimension)});

Display the vector of mean values.

dist{1}
ans = 3x1 single column vector

   -5.6127
    3.9449
    9.6213

Display the vector of standard deviations.

dist{2}
ans = 3x1 single column vector

    0.8516
    0.8366
    0.7004

You can now use the actor to create a suitable agent (such as rlACAgent, rlPGAgent, rlSACAgentOptions, rlPPOAgent, or rlTRPOAgentOptions).

Version History

Introduced in R2022a