sample
Syntax
Description
returns a mini-batch of N experiences from the replay memory
experience
= sample(buffer
,batchSize
)buffer
, where N is specified using
batchSize
.
[
returns a sequence padding mask indicating which the padded experiences at the end of a
sampled sequence.experience
,Mask
] = sample(buffer
,batchSize
)
___ = sample(
specifies additional sampling options using one or more name-value pair arguments.buffer
,batchSize
,Name=Value
)
Examples
Create Experience Buffer
Define observation specifications for the environment. For this example, assume that the environment has a single observation channel with three continuous signals in specified ranges.
obsInfo = rlNumericSpec([3 1],... LowerLimit=0,... UpperLimit=[1;5;10]);
Define action specifications for the environment. For this example, assume that the environment has a single action channel with two continuous signals in specified ranges.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 20,000.
buffer = rlReplayMemory(obsInfo,actInfo,20000);
Append a single experience to the buffer using a structure. Each experience contains the following elements: current observation, action, next observation, reward, and is-done.
For this example, create an experience with random observation, action, and reward values. Indicate that this experience is not a terminal condition by setting the IsDone
value to 0.
exp.Observation = {obsInfo.UpperLimit.*rand(3,1)}; exp.Action = {actInfo.UpperLimit.*rand(2,1)}; exp.Reward = 10*rand(1); exp.NextObservation = {obsInfo.UpperLimit.*rand(3,1)}; exp.IsDone = 0;
Before appending experience to the buffer, you can validate whether the experience is compatible with the buffer. The validateExperience
function generates an error if the experience is incompatible with the buffer.
validateExperience(buffer,exp)
Append the experience to the buffer.
append(buffer,exp);
You can also append a batch of experiences to the experience buffer using a structure array. For this example, append a sequence of 100 random experiences, with the final experience representing a terminal condition.
for i = 1:100 expBatch(i).Observation = {obsInfo.UpperLimit.*rand(3,1)}; expBatch(i).Action = {actInfo.UpperLimit.*rand(2,1)}; expBatch(i).Reward = 10*rand(1); expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(3,1)}; expBatch(i).IsDone = 0; end expBatch(100).IsDone = 1; validateExperience(buffer,expBatch) append(buffer,expBatch);
After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 50 experiences from the buffer.
miniBatch = sample(buffer,50);
You can sample a horizon of data from the buffer. For example, sample a horizon of 10 consecutive experiences with a discount factor of 0.95.
horizonSample = sample(buffer,1,... NStepHorizon=10,... DiscountFactor=0.95);
The returned sample includes the following information.
Observation
andAction
are the observation and action from the first experience in the horizon.NextObservation
andIsDone
are the next observation and termination signal from the final experience in the horizon.Reward
is the cumulative reward across the horizon using the specified discount factor.
You can also sample a sequence of consecutive experiences. In this case, the structure fields contain arrays with values for all sampled experiences.
sequenceSample = sample(buffer,1,...
SequenceLength=20);
Create Experience Buffer with Multiple Observation Channels
Define observation specifications for the environment. For this example, assume that the environment has two observation channels: one channel with two continuous observations and one channel with a three-valued discrete observation.
obsContinuous = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[1;5]); obsDiscrete = rlFiniteSetSpec([1 2 3]); obsInfo = [obsContinuous obsDiscrete];
Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 5,000.
buffer = rlReplayMemory(obsInfo,actInfo,5000);
Append a sequence of 50 random experiences to the buffer.
for i = 1:50 exp(i).Observation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; exp(i).Action = {actInfo.UpperLimit.*rand(2,1)}; exp(i).NextObservation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; exp(i).Reward = 10*rand(1); exp(i).IsDone = 0; end append(buffer,exp);
After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 10 experiences from the buffer.
miniBatch = sample(buffer,10);
Input Arguments
buffer
— Experience buffer
rlReplayMemory
object | rlPrioritizedReplayMemory
object | rlHindsightReplayMemory
object | rlHindsightPrioritizedReplayMemory
object
Experience buffer, specified as one of the following replay memory objects.
batchSize
— Batch size
positive integer
Batch size of experiences to sample, specified as a positive integer.
If batchSize
is greater than the current length of the buffer,
then sample
returns no experiences.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: DiscountFactor=0.95
SequenceLength
— Sequence length
1
(default) | positive integer
Sequence length, specified as a positive integer. For each batch element, sample
up to SequenceLength
consecutive experiences. If a sampled
experience has a nonzero IsDone
value, stop the sequence at that
experience.
NStepHorizon
— N-step horizon length
1
(default) | positive integer
N-step horizon length, specified as a positive integer. For each batch element,
sample up to NStepHorizon
consecutive experiences. If a sampled
experience has a nonzero IsDone
value, stop the horizon at that
experience. Return the following experience information based on the sampled
horizon.
Observation
andAction
values from the first experience in the horizon.NextObservation
andIsDone
values from the final experience in the horizon.Cumulative reward across the horizon using the specified discount factor,
DiscountFactor
.
Sampling an n-step horizon is not supported when sampling sequences. Therefore, if
SequenceLength
> 1
, then
NStepHorizon
must be 1
.
DiscountFactor
— Discount factor
0.99
(default) | nonnegative scalar less than or equal to one
Discount factor, specified as a nonnegative scalar less than or equal to one. When
you sample a horizon of experiences (NStepHorizon
>
1
), sample
returns the cumulative reward
R computed as follows.
Here:
γ is the discount factor.
N is the sampled horizon length, which can be less than
NStepHorizon
.Ri is the reward for the ith horizon step.
DiscountFactor
applies only when
NStepHorizon
is greater than one.
DataSourceID
— Data source index
-1
(default) | nonnegative integer
Data source index, specified as one of the following:
-1
— Sample from the experiences of all data sources.Nonnegative integer — Sample from the experiences of only the data source specified by
DataSourceID
.
ReturnDlarray
— Option to return output as deep learning array
false
(default) | true
Option to return output as deep learning array, specified as a logical value. When
you specify ReturnDlarray
as true
the fields
of experience
are dlarray
objects.
Example: ReturnDlarray=true
ReturnGpuArray
— Option to return output as GPU array
false
(default) | true
Option to return output as GPU array, specified as a logical value. When you
specify ReturnGPUarray
as true
the fields of
experience
are stored in the GPU.
Setting this option to true
requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).
You can use gpuDevice
(Parallel Computing Toolbox) to query or select a local GPU device to be
used with MATLAB®.
Example: ReturnGpuArray=true
Output Arguments
experience
— Experiences sampled from the buffer
structure
Experiences sampled from the buffer, returned as a structure with the following fields.
Observation
— Observation
cell array
Observation, returned as a cell array with length equal to the number of
observation specifications specified when creating the buffer. Each element of
Observation
contains a
DO-by-batchSize
-by-SequenceLength
array, where DO is the dimension of the
corresponding observation specification.
Action
— Agent action
cell array
Agent action, returned as a cell array with length equal to the number of
action specifications specified when creating the buffer. Each element of
Action
contains a
DA-by-batchSize
-by-SequenceLength
array, where DA is the dimension of the
corresponding action specification.
Reward
— Reward value
scalar | array
Reward value obtained by taking the specified action from the observation,
returned as a 1-by-1-by-SequenceLength
array.
NextObservation
— Next observation
cell array
Next observation reached by taking the specified action from the observation,
returned as a cell array with the same format as
Observation
.
IsDone
— Termination signal
integer | array
Termination signal, returned as a
1-by-1-by-SequenceLength
array of integers. Each element of
IsDone
has one of the following values.
0
— This experience is not the end of an episode.1
— The episode terminated because the environment generated a termination signal.2
— The episode terminated by reaching the maximum episode length.
Mask
— Sequence padding mask
logical array
Sequence padding mask, returned as a logical array with length equal to
SequenceLength
. When the sampled sequence length is less than
SequenceLength
, the data returned in
experience
is padded. Each element of Mask
is true
for a real experience and false
for a
padded experience.
You can ignore Mask
when SequenceLength
is
1.
Version History
Introduced in R2022a
See Also
Functions
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)