rlQValueRepresentation

(Not recommended) Q-Value function critic representation for reinforcement learning agents

Since R2020a

rlQValueRepresentation is not recommended. Use rlQValueFunction or rlVectorQValueFunction instead. For more information, see rlQValueRepresentation is not recommended.

Description

This object implements a Q-value function approximator to be used as a critic within a reinforcement learning agent. A Q-value function is a function that maps an observation-action pair to a scalar value representing the expected total long-term rewards that the agent is expected to accumulate when it starts from the given observation and executes the given action. Q-value function critics therefore need both observations and actions as inputs. After you create an rlQValueRepresentation critic, use it to create an agent relying on a Q-value function critic, such as an rlQAgent, rlDQNAgent, rlSARSAAgent, rlDDPGAgent, or rlTD3Agent. For more information on creating representations, see Create Policies and Value Functions.

Creation

Syntax

critic = rlQValueRepresentation(net,observationInfo,actionInfo,'Observation',obsName,'Action',actName)

critic = rlQValueRepresentation(tab,observationInfo,actionInfo)

critic = rlQValueRepresentation({basisFcn,W0},observationInfo,actionInfo)

critic = rlQValueRepresentation(net,observationInfo,actionInfo,'Observation',obsName)

critic = rlQValueRepresentation({basisFcn,W0},observationInfo,actionInfo)

critic = rlQValueRepresentation(___,options)

Description

Scalar Output Q-Value Critic

example

critic = rlQValueRepresentation(net,observationInfo,actionInfo,'Observation',obsName,'Action',actName) creates the Q-value function critic. net is the deep neural network used as an approximator, and must have both observations and action as inputs, and a single scalar output. This syntax sets the ObservationInfo and ActionInfo properties of critic respectively to the inputs observationInfo and actionInfo, containing the observations and action specifications. obsName must contain the names of the input layers of net that are associated with the observation specifications. The action name actName must be the name of the input layer of net that is associated with the action specifications.

example

critic = rlQValueRepresentation(tab,observationInfo,actionInfo) creates the Q-value function based critic with discrete action and observation spaces from the Q-value table tab. tab is a rlTable object containing a table with as many rows as the possible observations and as many columns as the possible actions. This syntax sets the ObservationInfo and ActionInfo properties of critic respectively to the inputs observationInfo and actionInfo, which must be rlFiniteSetSpec objects containing the specifications for the discrete observations and action spaces, respectively.

example

critic = rlQValueRepresentation({basisFcn,W0},observationInfo,actionInfo) creates a Q-value function based critic using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn to a custom basis function, and the second element contains the initial weight vector W0. Here the basis function must have both observations and action as inputs and W0 must be a column vector. This syntax sets the ObservationInfo and ActionInfo properties of critic respectively to the inputs observationInfo and actionInfo.

Multi-Output Discrete Action Space Q-Value Critic

example

critic = rlQValueRepresentation(net,observationInfo,actionInfo,'Observation',obsName) creates the multi-output Q-value function critic for a discrete action space. net is the deep neural network used as an approximator, and must have only the observations as input and a single output layer having as many elements as the number of possible discrete actions. This syntax sets the ObservationInfo and ActionInfo properties of critic respectively to the inputs observationInfo and actionInfo, containing the observations and action specifications. Here, actionInfo must be an rlFiniteSetSpec object containing the specifications for the discrete action space. The observation names obsName must be the names of the input layers of net.

example

critic = rlQValueRepresentation({basisFcn,W0},observationInfo,actionInfo) creates the multi-output Q-value function critic for a discrete action space using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn to a custom basis function, and the second element contains the initial weight matrix W0. Here the basis function must have only the observations as inputs, and W0 must have as many columns as the number of possible actions. This syntax sets the ObservationInfo and ActionInfo properties of critic respectively to the inputs observationInfo and actionInfo.

Options

critic = rlQValueRepresentation(___,options) creates the value function based critic using the additional option set options, which is an rlRepresentationOptions object. This syntax sets the Options property of critic to the options input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

`net` — Deep neural network
array of `Layer` objects | `layerGraph` object | `DAGNetwork` object | `SeriesNetwork` object | `dlNetwork` object

Deep neural network used as the underlying approximator within the critic, specified as one of the following:

Array of Layer objects
layerGraph object
DAGNetwork object
SeriesNetwork object
dlnetwork object

For single output critics, net must have both observations and actions as inputs, and a scalar output, representing the expected cumulative long-term reward when the agent starts from the given observation and takes the given action. For multi-output discrete action space critics, net must have only the observations as input and a single output layer having as many elements as the number of possible discrete actions. Each output element represents the expected cumulative long-term reward when the agent starts from the given observation and takes the corresponding action. The learnable parameters of the critic are the weights of the deep neural network.

The network input layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo. Also, the names of these input layers must match the observation names listed in obsName.

The network output layer must have the same data type and dimension as the signal defined in ActionInfo. Its name must be the action name specified in actName.

rlQValueRepresentation objects support recurrent deep neural networks for multi-output discrete action space critics.

For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

`obsName` — Observation names
string | character vector | cell array or character vectors

Observation names, specified as a cell array of strings or character vectors. The observation names must be the names of the observation input layers in net.

Example: {'my_obs'}

`actName` — Action name
string | character vector | single-element cell array containing a character vector

Action name, specified as a single-element cell array that contains a string or character vector. It must be the name of the action input layer of net.

Example: {'my_act'}

`tab` — Q-value table
`rlTable` object

Q-value table, specified as an rlTable object containing an array with as many rows as the possible observations and as many columns as the possible actions. The element (s,a) is the expected cumulative long-term reward for taking action a from observed state s. The elements of this array are the learnable parameters of the critic.

`basisFcn` — Custom basis function
function handle

Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The output of the critic is c = W'*B, where W is a weight vector or matrix containing the learnable parameters, and B is the column vector returned by the custom basis function.

For a single-output Q-value critic, c is a scalar representing the expected cumulative long term reward when the agent starts from the given observation and takes the given action. In this case, your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN,act)

For a multiple-output Q-value critic with a discrete action space, c is a vector in which each element is the expected cumulative long term reward when the agent starts from the given observation and takes the action corresponding to the position of the considered element. In this case, your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here, obs1 to obsN are observations in the same order and with the same data type and dimensions as the signals defined in observationInfo and act has the same data type and dimensions as the action specifications in actionInfo.

Example: @(obs1,obs2,act) [act(2)*obs1(1)^2; abs(obs2(5)+act(1))]

`W0` — Initial value of the basis function weights
matrix

Initial value of the basis function weights, W. For a single-output Q-value critic, W is a column vector having the same length as the vector returned by the basis function. For a multiple-output Q-value critic with a discrete action space, W is a matrix which must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.

Properties

expand all

`Options` — Representation options
`rlRepresentationOptions` object

Representation options, specified as an rlRepresentationOptions object. Available options include the optimizer used for training and the learning rate.

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. These objects define properties such as the dimensions, data type, and names of the observation signals.

rlQValueRepresentation sets the ObservationInfo property of critic to the input observationInfo.

You can extract ObservationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually.

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

Action specifications, specified as an rlFiniteSetSpec or rlNumericSpec object. These objects define properties such as the dimensions, data type and name of the action signals.

rlQValueRepresentation sets the ActionInfo property of critic to the input actionInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specifications manually.

Object Functions

`rlDDPGAgent`	Deep deterministic policy gradient (DDPG) reinforcement learning agent
`rlTD3Agent`	Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent
`rlDQNAgent`	Deep Q-network (DQN) reinforcement learning agent
`rlQAgent`	Q-learning reinforcement learning agent
`rlSARSAAgent`	SARSA reinforcement learning agent
`rlSACAgent`	Soft actor-critic (SAC) reinforcement learning agent
`getValue`	Obtain estimated value from a critic given environment observations and actions
`getMaxQValue`	Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations

Examples

collapse all

Create Q-Value Function Critic from Deep Neural Network

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing two doubles.

actInfo = rlNumericSpec([2 1]);

Create a deep neural network to approximate the Q-value function. The network must have two inputs, one for the observation and one for the action. The observation input (here called myobs) must accept a four-element vector (the observation vector defined by obsInfo). The action input (here called myact) must accept a two-element vector (the action vector defined by actInfo). The output of the network must be a scalar, representing the expected cumulative long-term reward when the agent starts from the given observation and takes the given action.

% observation path layers
obsPath = [featureInputLayer(4, ...
               'Normalization','none','Name','myobs') 
           fullyConnectedLayer(1,'Name','obsout')];

% action path layers
actPath = [featureInputLayer(2, ...
               'Normalization','none','Name','myact') 
           fullyConnectedLayer(1,'Name','actout')];

% common path to output layers
comPath = [additionLayer(2,'Name', 'add')  ...
           fullyConnectedLayer(1, 'Name', 'output')];

% add layers to network object
net = addLayers(layerGraph(obsPath),actPath); 
net = addLayers(net,comPath);

% connect layers
net = connectLayers(net,'obsout','add/in1');
net = connectLayers(net,'actout','add/in2');

Create the critic with rlQValueRepresentation, using the network, the observations and action specification objects, as well as the names of the network input layers.

critic = rlQValueRepresentation(net,obsInfo,actInfo, ...
    'Observation',{'myobs'},'Action',{'myact'})

critic = 
  rlQValueRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlNumericSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the value of a random observation and action, using the current network weights.

v = getValue(critic,{rand(4,1)},{rand(2,1)})

v = single
    0.1102

You can now use the critic (along with an with an actor) to create an agent relying on a Q-value function critic (such as an rlQAgent, rlDQNAgent, rlSARSAAgent, or rlDDPGAgent agent).

Create Multi-Output Q-Value Function Critic from Deep Neural Network

This example shows how to create a multi-output Q-value function critic for a discrete action space using a deep neural network approximator.

This critic takes only the observation as input and produces as output a vector with as many elements as the possible actions. Each element represents the expected cumulative long term reward when the agent starts from the given observation and takes the action corresponding to the position of the element in the output vector.

obsInfo = rlNumericSpec([4 1]);

Create a finite set action specification object (or alternatively use getActionInfo to extract the specification object from an environment with a discrete action space). For this example, define the action space as a finite set consisting of three possible values (named 7, 5, and 3 in this case).

actInfo = rlFiniteSetSpec([7 5 3]);

Create a deep neural network approximator to approximate the Q-value function within the critic. The input of the network (here called myobs) must accept a four-element vector, as defined by obsInfo. The output must be a single output layer having as many elements as the number of possible discrete actions (three in this case, as defined by actInfo).

net = [featureInputLayer(4,...
           'Normalization','none','Name','myobs') 
       fullyConnectedLayer(3,'Name','value')];

Create the critic using the network, the observations specification object, and the name of the network input layer.

critic = rlQValueRepresentation(net,obsInfo,actInfo,...
             'Observation',{'myobs'})

critic = 
  rlQValueRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the values of a random observation, using the current network weights. There is one value for each of the three possible actions.

v = getValue(critic,{rand(4,1)})

v = 3x1 single column vector

    0.7232
    0.8177
   -0.2212

You can now use the critic (along with an actor) to create a discrete action space agent relying on a Q-value function critic (such as an rlQAgent, rlDQNAgent, or rlSARSAAgent agent).

Create Q-Value Function Critic from Table

Create a finite set observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment with a discrete observation space). For this example define the observation space as a finite set with of 4 possible values.

obsInfo = rlFiniteSetSpec([7 5 3 1]);

Create a finite set action specification object (or alternatively use getActionInfo to extract the specification object from an environment with a discrete action space). For this example define the action space as a finite set with 2 possible values.

actInfo = rlFiniteSetSpec([4 8]);

Create a table to approximate the value function within the critic. rlTable creates a value table object from the observation and action specifications objects.

qTable = rlTable(obsInfo,actInfo);

The table stores a value (representing the expected cumulative long term reward) for each possible observation-action pair. Each row corresponds to an observation and each column corresponds to an action. You can access the table using the Table property of the vTable object. The initial value of each element is zero.

qTable.Table

You can initialize the table to any value, in this case, an array containing the integer from 1 through 8.

qTable.Table=reshape(1:8,4,2)

qTable = 
  rlTable with properties:

    Table: [4x2 double]

Create the critic using the table as well as the observations and action specification objects.

critic = rlQValueRepresentation(qTable,obsInfo,actInfo)

critic = 
  rlQValueRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
    ObservationInfo: [1x1 rl.util.rlFiniteSetSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the value of a given observation and action, using the current table entries.

v = getValue(critic,{5},{8})

v = 6

You can now use the critic (along with an with an actor) to create a discrete action space agent relying on a Q-value function critic (such as an rlQAgent, rlDQNAgent, or rlSARSAAgent agent).

Create Q-Value Function Critic from Custom Basis Function

obsInfo = rlNumericSpec([3 1]);

actInfo = rlNumericSpec([2 1]);

Create a custom basis function to approximate the value function within the critic. The custom basis function must return a column vector. Each vector element must be a function of the observations and actions respectively defined by obsInfo and actInfo.

myBasisFcn = @(myobs,myact) [...
    myobs(2)^2; ...
    myobs(1)+exp(myact(1)); ...
    abs(myact(2)); ...
    myobs(3)]

myBasisFcn = function_handle with value:
    @(myobs,myact)[myobs(2)^2;myobs(1)+exp(myact(1));abs(myact(2));myobs(3)]

The output of the critic is the scalar W'*myBasisFcn(myobs,myact), where W is a weight column vector which must have the same size of the custom basis function output. This output is the expected cumulative long term reward when the agent starts from the given observation and takes the best possible action. The elements of W are the learnable parameters.

Define an initial parameter vector.

W0 = [1;4;4;2];

Create the critic. The first argument is a two-element cell containing both the handle to the custom function and the initial weight vector. The second and third arguments are, respectively, the observation and action specification objects.

critic = rlQValueRepresentation({myBasisFcn,W0},...
         obsInfo,actInfo)

critic = 
  rlQValueRepresentation with properties:

         ActionInfo: [1×1 rl.util.rlNumericSpec]
    ObservationInfo: [1×1 rl.util.rlNumericSpec]
            Options: [1×1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the value of a given observation-action pair, using the current parameter vector.

v = getValue(critic,{[1 2 3]'},{[4 5]'})

v = 
  1×1 dlarray

  252.3926

You can now use the critic (along with an with an actor) to create an agent relying on a Q-value function critic (such as an rlQAgent, rlDQNAgent, rlSARSAAgent, or rlDDPGAgent agent).

Create Multi-Output Q-Value Function Critic from Custom Basis Function

This example shows how to create a multi-output Q-value function critic for a discrete action space using a custom basis function approximator.

obsInfo = rlNumericSpec([2 1]);

Create a finite set action specification object (or alternatively use getActionInfo to extract the specification object from an environment with a discrete action space). For this example, define the action space as a finite set consisting of 3 possible values (named 7, 5, and 3 in this case).

actInfo = rlFiniteSetSpec([7 5 3]);

myBasisFcn = @(myobs) [myobs(2)^2; ...
                       myobs(1); ...
                       exp(myobs(2)); ...
                       abs(myobs(1))]

myBasisFcn = function_handle with value:
    @(myobs)[myobs(2)^2;myobs(1);exp(myobs(2));abs(myobs(1))]

The output of the critic is the vector c = W'*myBasisFcn(myobs), where W is a weight matrix which must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.

Each element of c is the expected cumulative long term reward when the agent starts from the given observation and takes the action corresponding to the position of the considered element. The elements of W are the learnable parameters.

Define an initial parameter matrix.

W0 = rand(4,3);

Create the critic. The first argument is a two-element cell containing both the handle to the custom function and the initial parameter matrix. The second and third arguments are, respectively, the observation and action specification objects.

critic = rlQValueRepresentation({myBasisFcn,W0},...
             obsInfo,actInfo)

critic = 
  rlQValueRepresentation with properties:

         ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
    ObservationInfo: [1x1 rl.util.rlNumericSpec]
            Options: [1x1 rl.option.rlRepresentationOptions]

To check your critic, use the getValue function to return the values of a random observation, using the current parameter matrix. Note that there is one value for each of the three possible actions.

v = getValue(critic,{rand(2,1)})

v = 
  3x1 dlarray

    2.1395
    1.2183
    2.3342

You can now use the critic (along with an actor) to create a discrete action space agent relying on a Q-value function critic (such as an rlQAgent, rlDQNAgent, or rlSARSAAgent agent).

Create Q-Value Function Critic from Recurrent Neural Network

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs = obsInfo.Dimension(1);
numDiscreteAct = numel(actInfo.Elements);

Create a recurrent deep neural network for your critic. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include at least one lstmLayer.

Create a recurrent neural network for a multi-output Q-value function representation.

criticNetwork = [
    sequenceInputLayer(numObs,...
        'Normalization','none','Name','state')
    fullyConnectedLayer(50, 'Name', 'CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    lstmLayer(20,'OutputMode','sequence',...
        'Name','CriticLSTM');
    fullyConnectedLayer(20,'Name','CriticStateFC2')
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(numDiscreteAct,...
        'Name','output')];

Create a representation for your critic using the recurrent neural network.

criticOptions = rlRepresentationOptions(...
                'LearnRate',1e-3,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,...
                obsInfo,actInfo,...
                'Observation','state',criticOptions);

Version History

Introduced in R2020a

expand all

R2022a: `rlQValueRepresentation` is not recommended

rlQValueRepresentation is not recommended. Use either rlQValueFunction or rlVectorQValueFunction instead.

The following table shows some typical uses of rlQValueRepresentation to create neural network-based critics, and how to update your code with one of the new Q-value approximator objects instead.

Network-Based Q-Value Representation: Not Recommended	Network-Based Q-Value Approximators: Recommended
`myCritic = rlQValueRepresentation(net,obsInfo,actInfo,'Observation',obsNames,'Action',actNames)`, with `net` having both observations and actions as inputs and a single scalar output.	`myCritic = rlQValueFunction(net,obsInfo,actInfo,'ObservationInputNames',obsNames,'ActionInputNames',actNames)`. Use this syntax to create a single-output state-action value function object for a critic that takes both observation and action as inputs.
`myCritic = rlQValueRepresentation(net,obsInfo,actInfo,'Observation',obsNames)` with `net` having only the observations as inputs and a single output layer having as many elements as the number of possible discrete actions.	`myCritic = rlVectorQValueFunction(net,obsInfo,actInfo,'ObservationInputNames',obsNames)`. Use this syntax to create a multiple-output state-action value function object for a critic with a discrete action space. This critic takes observations as inputs, and outputs a vector in which each element is the value of one of the possible actions.

The following table shows some typical uses of rlQValueRepresentation to create table-based critics with discrete observation and action spaces, and how to update your code with one of the new Q-value approximator objects instead.

Table-Based Q-Value Representation: Not Recommended	Table-Based Q-Value Approximators: Recommended
`rep = rlQValueRepresentation(tab,obsInfo,actInfo)`, where the table `tab` contains a vector with as many elements as the number of possible observations plus the number of possible actions.	`rep = rlQValueFunction(tab,obsInfo,actInfo)`. Use this syntax to create a single-output state-action value function object for a critic that takes both observations and actions as input.
`rep = rlQValueRepresentation(tab,obsInfo,actInfo)`, where the table `tab` contains a Q-value table with as many rows as the number of possible observations and as many columns as the number of possible actions.	`rep = rlVectorQValueFunction(tab,obsInfo,actInfo)`. Use this syntax to create a multiple-output state-action value function object for a critic with a discrete action space. This critic takes observations as inputs, and outputs a vector in which each element is the value of one of the possible actions. It is good practice to use critics with vector outputs when possible.

The following table shows some typical uses of rlQValueRepresentation to create critics which use a (linear in the learnable parameters) custom basis function, and how to update your code with one of the new Q-value approximator objects instead. In these function calls, the first input argument is a two-element cell array containing both the handle to the custom basis function and the initial weight vector or matrix.

Custom Basis Function-Based Q-Value Representation: Not Recommended	Custom Basis Function-Based Q-Value Approximators: Recommended
`rep = rlQValueRepresentation({basisFcn,W0},obsInfo,actInfo)`, where the basis function has both observations and action as inputs and `W0` is a column vector.	`rep = rlQValueRepresentation({basisFcn,W0},obsInfo,actInfo)`. Use this syntax to create a single-output state-action value function object for a critic that takes both observation and action as inputs.
`rep = rlQValueRepresentation({basisFcn,W0},obsInfo,actInfo)`, where the basis function has both observations and action as inputs and `W0` is a matrix with as many columns as the number of possible actions.	`rep = rlVectorQValueRepresentation({basisFcn,W0},obsInfo,actInfo)`. Use this syntax to create a multiple-output state-action value function object for a critic with a discrete action space. This critic takes observations as inputs, and outputs a vector in which each element is the value of one of the possible actions. It is good practice to use critics with vector outputs when possible.

rlQValueRepresentation

Description

Creation

Syntax

Description

Scalar Output Q-Value Critic

Multi-Output Discrete Action Space Q-Value Critic

Options

Input Arguments

`net` — Deep neural network
array of `Layer` objects | `layerGraph` object | `DAGNetwork` object | `SeriesNetwork` object | `dlNetwork` object

`obsName` — Observation names
string | character vector | cell array or character vectors

`actName` — Action name
string | character vector | single-element cell array containing a character vector

`tab` — Q-value table
`rlTable` object

`basisFcn` — Custom basis function
function handle

`W0` — Initial value of the basis function weights
matrix

Properties

`Options` — Representation options
`rlRepresentationOptions` object

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

Object Functions

Examples

Create Q-Value Function Critic from Deep Neural Network

Create Multi-Output Q-Value Function Critic from Deep Neural Network

Create Q-Value Function Critic from Table

Create Q-Value Function Critic from Custom Basis Function

Create Multi-Output Q-Value Function Critic from Custom Basis Function

Create Q-Value Function Critic from Recurrent Neural Network

Version History

R2022a: `rlQValueRepresentation` is not recommended

See Also

Functions

Objects

Topics

rlQValueRepresentation

Description

Creation

Syntax

Description

Scalar Output Q-Value Critic

Multi-Output Discrete Action Space Q-Value Critic

Options

Input Arguments

net — Deep neural network array of Layer objects | layerGraph object | DAGNetwork object | SeriesNetwork object | dlNetwork object

obsName — Observation names string | character vector | cell array or character vectors

actName — Action name string | character vector | single-element cell array containing a character vector

tab — Q-value table rlTable object

basisFcn — Custom basis function function handle

W0 — Initial value of the basis function weights matrix

Properties

Options — Representation options rlRepresentationOptions object

ObservationInfo — Observation specifications rlFiniteSetSpec object | rlNumericSpec object | array

ActionInfo — Action specifications rlFiniteSetSpec object | rlNumericSpec object

Object Functions

Examples

Create Q-Value Function Critic from Deep Neural Network

Create Multi-Output Q-Value Function Critic from Deep Neural Network

Create Q-Value Function Critic from Table

Create Q-Value Function Critic from Custom Basis Function

Create Multi-Output Q-Value Function Critic from Custom Basis Function

Create Q-Value Function Critic from Recurrent Neural Network

Version History

R2022a: rlQValueRepresentation is not recommended

See Also

Functions

Objects

Topics

`net` — Deep neural network
array of `Layer` objects | `layerGraph` object | `DAGNetwork` object | `SeriesNetwork` object | `dlNetwork` object

`obsName` — Observation names
string | character vector | cell array or character vectors

`actName` — Action name
string | character vector | single-element cell array containing a character vector

`tab` — Q-value table
`rlTable` object

`basisFcn` — Custom basis function
function handle

`W0` — Initial value of the basis function weights
matrix

`Options` — Representation options
`rlRepresentationOptions` object

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

R2022a: `rlQValueRepresentation` is not recommended