getExplorationPolicy

Extract exploratory (stochastic) policy object from agent

Since R2023a

Syntax

policy = getExplorationPolicy(agent)

Description

policy = getExplorationPolicy(agent) returns a stochastic policy object from the specified reinforcement learning agent. Stochastic polices are useful for exploration.

example

Examples

collapse all

Extract Policy Object from Agent

Open Live Script

For this example, load the PG agent trained in Train PG Agent with Custom Actor Network to Balance Discrete Cart-Pole.

load("MATLABCartpolePG.mat","agent")

Extract the agent greedy policy using getGreedyPolicy.

policyDtr = getGreedyPolicy(agent)

policyDtr = 
  rlStochasticActorPolicy with properties:

                     Actor: [1×1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 1
             Normalization: "none"
           ObservationInfo: [1×1 rl.util.rlNumericSpec]
                ActionInfo: [1×1 rl.util.rlFiniteSetSpec]
                SampleTime: 1

Note that, in the extracted policy object, the UseMaxLikelihoodAction property is set to true. This means that the policy object always generates the maximum likelihood action in response to a given observation, and is therefore greedy (and deterministic).

Alternatively, you can extract a stochastic policy using getExplorationPolicy.

policyXpl = getExplorationPolicy(agent)

policyXpl = 
  rlStochasticActorPolicy with properties:

                     Actor: [1×1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 0
             Normalization: "none"
           ObservationInfo: [1×1 rl.util.rlNumericSpec]
                ActionInfo: [1×1 rl.util.rlFiniteSetSpec]
                SampleTime: 1

This time, the extracted policy object has the UseMaxLikelihoodAction property is set to false. This means that the policy object generates a random action, given an observation. The policy is therefore stochastic and useful for exploration.

Input Arguments

collapse all

`agent` — Agent
reinforcement learning agent object

Agent, specified as one of the following reinforcement learning agent objects:

rlQAgent
rlSARSAAgent
rlLSPIAgent
rlDQNAgent
rlPGAgent
rlACAgent
rlPPOAgent
rlTRPOAgent
rlTD3Agent
rlDDPGAgent
rlSACAgent
rlMBPOAgent
Custom agent — For more information on custom agents, see Create Custom Reinforcement Learning Agents.

Note

agent is a handle object, so a function that does not return it as output argument, such as train, can still update it. For more information about handle objects, see Handle Object Behavior.

For more information on reinforcement learning agents, see Reinforcement Learning Agents.

Example: agent = rlPPOAgent(rlNumericSpec([2 1]),rlNumericSpec([1 1])) creates the default rlPPOAgent object agent for an environment with an observation channel carrying a continuous two-element vector and an action channel carrying a continuous scalar.

Note

if agent is an rlMBPOAgent object, to extract the exploration policy, use getExplorationPolicy(agent.BaseAgent).

Output Arguments

collapse all

`policy` — Reinforcement learning policy object
`rlEpsilonGreedyPolicy` object | `rlAdditiveNoisePolicy` object | `rlStochasticActorPolicy` object

Policy object, returned as one of the following:

rlEpsilonGreedyPolicy object — Returned when agent is an rlQAgent, rlSARSAAgent, or rlDQNAgent object.
rlAdditiveNoisePolicy object — Returned when agent is an rlDDPGAgent or rlTD3Agent object.
rlStochasticActorPolicy object, with the UseMaxLikelihoodAction set to false — Returned when agent is an rlACAgent, rlPGAgent, rlPPOAgent, rlTRPOAgent or rlSACAgent object. Because the returned policy object has the UseMaxLikelihoodAction property set to false, it always generates a random action (according to the policy probability distribution) as a response to a given observation, and is therefore exploratory (and stochastic).
rlHybridStochasticActorPolicy object, with the UseMaxLikelihoodAction set to false — Returned when agent is an rlSACAgent object. Because the returned policy object has the UseMaxLikelihoodAction property set to false, it always generates a random action (according to the policy probability distribution) as a response to a given observation, and is therefore exploratory (and stochastic).

Version History

Introduced in R2023a

getExplorationPolicy

Syntax

Description

Examples

Extract Policy Object from Agent

Input Arguments

`agent` — Agent
reinforcement learning agent object

Output Arguments

`policy` — Reinforcement learning policy object
`rlEpsilonGreedyPolicy` object | `rlAdditiveNoisePolicy` object | `rlStochasticActorPolicy` object

Version History

See Also

Functions

Objects

Blocks

Topics

getExplorationPolicy

Syntax

Description

Examples

Extract Policy Object from Agent

Input Arguments

agent — Agent reinforcement learning agent object

Output Arguments

policy — Reinforcement learning policy object rlEpsilonGreedyPolicy object | rlAdditiveNoisePolicy object | rlStochasticActorPolicy object

Version History

See Also

Functions

Objects

Blocks

Topics

`agent` — Agent
reinforcement learning agent object

`policy` — Reinforcement learning policy object
`rlEpsilonGreedyPolicy` object | `rlAdditiveNoisePolicy` object | `rlStochasticActorPolicy` object