rlHindsightPrioritizedReplayMemory
Description
An off-policy reinforcement learning agent stores experiences in a circular experience buffer.
During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:
S is the current observation of the environment.
A is the action taken by the agent.
R is the reward for taking action A.
S' is the next observation after taking action A.
D is the is-done signal after taking action A.
The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.
By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an rlReplayMemory
object
as their experience buffer. For goal-conditioned tasks, where the observation includes both
the goal and a goal measurement, you can use an
rlHindsightPrioritizedReplayMemory
object.
rlHindsightReplayMemory
objects uniformly sample experiences from the buffer. To use prioritized nonuniform sampling,
which can improve sample efficiency, use an
rlHindsightPrioritizedReplayMemory
object.
A hindsight replay memory experience buffer:
Generates additional experiences by replacing goals with goal measurements
Improves sample efficiency for tasks with sparse rewards
Requires a ground-truth reward function and is-done function
Is not necessary when you have a well-shaped reward function
For more information on hindsight experience replay and prioritized sampling, see Algorithms.
Creation
Syntax
Description
creates a hindsight prioritized replay memory experience buffer that is compatible with
the observation and action specifications in buffer
= rlHindsightPrioritizedReplayMemory(obsInfo
,actInfo
,rewardFcn
,isDoneFcn
,goalConditionInfo
)obsInfo
and
actInfo
, respectively. This syntax sets the
RewardFcn
, IsDoneFcn
, and
GoalConditionInfo
properties.
Input Arguments
Properties
Object Functions
append | Append experiences to replay memory buffer |
sample | Sample experiences from replay memory buffer |
resize | Resize replay memory experience buffer |
reset | Reset environment, agent, experience buffer, or policy object |
allExperiences | Return all experiences in replay memory buffer |
validateExperience | Validate experiences for replay memory |
generateHindsightExperiences | Generate hindsight experiences from hindsight experience replay buffer |
getActionInfo | Obtain action data specifications from reinforcement learning environment, agent, or experience buffer |
getObservationInfo | Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer |
Examples
Limitations
Hindsight prioritized experience replay does not support agents that use recurrent neural networks.
Algorithms
References
[1] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. 'Prioritized experience replay'. arXiv:1511.05952 [Cs] 25 February 2016. https://arxiv.org/abs/1511.05952.
[2] Andrychowicz, Marcin, Filip Wolski,Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojiech Zaremba. 'Hindsight experience replay'. 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA: 2017.
Version History
Introduced in R2023a