rlTurnBasedFunctionEnv
Create custom turn-based multiagent reinforcement learning environment
Since R2023b
Description
Use rlTurnBasedFunctionEnv
to create a custom turn-based
multiagent reinforcement learning environment in which agents execute in turns. To create your
custom environment, you supply the observation and action specifications as well as your own
reset and step MATLAB® functions. To verify the operation of your environment,
rlTurnBasedFunctionEnv
automatically calls validateEnvironment
after creating the
environment.
Creation
Description
creates a turn-based multiagent environment using observation and action specifications
and custom step and reset functions. The cell arrays env
= rlTurnBasedFunctionEnv(observationInfo
,actionInfo
,stepFcn
,resetFcn
)observationInfo
and actionInfo
must contain the observation and action
specifications, respectively, for each agent. The stepFcn
and
resetFcn
arguments are the names of your step and reset MATLAB functions, respectively, and they are used to set the StepFcn
and ResetFcn
properties of env
.
Input Arguments
observationInfo
— Observation specifications
cell array
Observation specifications, specified as a cell array with as many elements as the
number of agents. Every element of the cell must contain the observation
specifications for a corresponding agent. The observation specification for an agent
must be an rlFiniteSetSpec
or rlNumericSpec
object or a vector containing a mix of such objects (in which case every element of
the vector defines the properties of a specific observation channel for the
agent).
actionInfo
— Action specifications
cell array
Action specifications, specified as a cell array with as many elements as the
number of agents. Every element of the cell must contain the observation
specifications for a corresponding agent. The action specification for an agent must
be an rlFiniteSetSpec
(for discrete action spaces) or rlNumericSpec
(for continuous action spaces) object. This object defines the properties of the
action channel for the agent.
Note
Only one action channel per agent is allowed.
Properties
StepFcn
— Environment step function
function name | function handle | anonymous function handle
Environment step function, specified as a function name, function handle, or handle
to an anonymous function. The sim
and
train
functions
call StepFcn
to update the environment at every simulation or
training step.
This function must have two inputs and four outputs, as illustrated by the following signature.
[NextObservation,Reward,IsDone,UpdatedInfo] = myStepFunction(Action,Info)
For a given action input, the step function returns the values of the next observation and reward, a logical value indicating whether the episode is terminated, and an updated environment information variable.
Specifically, the required input and output arguments are:
Action
— Cell array containing the current actions from the agents that are currently executing. Must contain as many elements as the number of agents that are executing at the current step. Each element of theAction
cell must match the dimensions and data type specified in the corresponding element of theactionInfo(ActiveAgentIndex)
cell.Info
andUpdatedInfo
— This must be a structure containing the fieldActiveAgentIndex
, which is a scalar or vector of indices indicating the agents that are active in the current step. The environment step function can modify the value to control the execution of agents in the next step. Other optional fields can contain the environment state and parameters or any data that you want to pass from one step to the next.The simulation or training functions (
train
orsim
) handle this variable by:Initializing
Info
using the second output argument returned byResetFcn
, at the beginning of the episodePassing
Info
as second input argument toStepFcn
at each training or simulation stepUpdating
Info
using the fourth output argument returned byStepFcn
,UpdatedInfo
NextObservation
— Cell array containing the next observations for all the agents. These are the observations related to the next state (the transition to the next state is caused by the current actions contained inAction
). Therefore,NextObservation
must contain as many elements as the number of agents and each element must match the dimensions and data types specified in the corresponding element of theobservationInfo
cell.Reward
— Vector containing the rewards for all the agents. These are the rewards generated by the transition from the current state to the next one. Each element of the vector must be a numeric scalar.IsDone
— Logical value indicating whether to end the simulation or training episode.
To use additional input arguments beyond the allowed two, define your additional
arguments in the MATLAB workspace, then specify stepFcn
as an anonymous
function that in turn calls your custom function with the additional arguments defined
in the workspace, as shown in the example Create Custom Environment Using Step and Reset Functions.
Example: StepFcn="myStepFcn"
ResetFcn
— Environment reset function
function name | function handle | anonymous function handle
Environment reset function, specified as a function name, function handle, or handle
to an anonymous function. The sim
function
calls your reset function to reset the environment at the start of each simulation, and
the train
function
calls it at the start of each training episode.
The reset function that you provide must have no inputs and two outputs, as illustrated by the following signature.
[InitialObservation,Info] = myResetFunction()
The reset function sets the environment to an initial state and computes the initial
value of the observation. For example, you can create a reset function that randomizes
certain state values such that each training episode begins from different initial
conditions. The InitialObservation
must be a cell array containing
the initial observations for all the agents. Therefore,
InitialObservation
must contain as many elements as the number of
agents and each element must match the dimensions and data types specified in the
corresponding element of the observationInfo
cell.
The Info
output of ResetFcn
initializes the
Info
property of your environment and contains any data that you
want to pass from one step to the next. This can be the environment state or a structure
containing state and parameters. The simulation or training function
(train
or sim
) supplies the current value of
Info
as the second input argument of StepFcn
,
then uses the fourth output argument returned by StepFcn
to update
the value of Info
.
To use additional input arguments beyond the allowed two, define your argument in
the MATLAB workspace, then specify stepFcn
as an anonymous
function that in turn calls your custom function with the additional arguments defined
in the workspace, as shown in the example Create Custom Environment Using Step and Reset Functions.
Example: ResetFcn="myResetFcn"
Info
— Information to pass to next step
structure
Information to pass to next step, specified as a structure containing the field
ActiveAgentIndex
, which is a scalar or vector of indexes indicating
the agents that are active in the current step. The environment step function can modify
this field to control the execution of agents in the next step. Other optional fields
can contain the environment state and parameters or any data that you want to pass from
one step to the next.
When ResetFcn
is called, whatever you define as the
Info
output of ResetFcn
initializes this
property. When a step occurs the simulation or training function
(train
or sim
) uses the current value of
Info
as the second input argument for StepFcn
.
Once StepFcn
completes, the simulation or training function then
updates the current value of Info
using the fourth output argument
returned by StepFcn
.
Example: Info.ActiveAgentIndex=[2 3]
Object Functions
getActionInfo | Obtain action data specifications from reinforcement learning environment, agent, or experience buffer |
getObservationInfo | Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer |
train | Train reinforcement learning agents within a specified environment |
sim | Simulate trained reinforcement learning agents within specified environment |
validateEnvironment | Validate custom reinforcement learning environment |
Examples
Create Custom Turn-Based Multiagent Function Environment
Create a custom turn-based multiagent environment by supplying custom MATLAB® functions. Using rlTurnBasedFunctionEnv
, you can create a custom MATLAB reinforcement learning environment in which agents execute in turns. To create your custom turn-based environment, you must define observation specifications, action specifications, and step and reset functions.
For this example, consider an environment containing four agents, all of them having a continuous observation space, and receiving observation vectors of four, two, five, and three elements respectively.
Define the agent observation spaces using a cell array.
obsInfo = { rlNumericSpec([4 1]), ... rlNumericSpec([2 1]), ... rlNumericSpec([5 1]), ... rlNumericSpec([3 1]) };
For this example, the first and fourth agents have a finite action set containing two and four elements, respectively, while the second and the third have continuous action spaces consisting of a scalar and a two-dimensional vector. Define the agent action sets and spaces using a cell array.
actInfo = { rlFiniteSetSpec([1 2]), ... rlNumericSpec([1 1]), ... rlNumericSpec([2 1]), ... rlFiniteSetSpec([1 2 3 4]) };
Next, specify your step and reset functions. For this example, use the functions resetFcn
and stepFcn
defined at the end of the example.
To create the custom turn-based multiagent function environment, use rlTurnBasedFunctionEnv
.
env = rlTurnBasedFunctionEnv( ... obsInfo,actInfo, ... @stepFcn,@resetFcn)
env = rlTurnBasedFunctionEnv with properties: StepFcn: @stepFcn ResetFcn: @resetFcn Info: [1x1 struct]
Note that while the custom reset and step functions that you must pass to rlTurnBasedFunctionEnv
must have exactly zero and two arguments, respectively, you can avoid this limitation by using anonymous functions. For an example on how to do this, see Create Custom Environment Using Step and Reset Functions.
You can now create agents for env
and train or simulate them as you would for any other environment.
Environment Functions
Environment reset function.
function [initialObs, info] = resetFcn() % RESETFUN sets the default state of the environment. % % - INITIALOBS is a 1xN cell array. % - INFO is a structure with the field: % - ActiveAgentIndex: a scalar or vector of agent indices that % are active in the current step. The value can be modified to % control the execution of agents in the next step. % - Other fields of any MATLAB data type can be used to pass % information from one step to the next. % For this example, initialize the agent observations randomly initialObs = { rand([4 1]), ... rand([2 1]), ... rand([5 1]), ... rand([3 1]) }; % Initialize the info structure info.EnvironmentState = initialObs; info.ExecutionOrder = {1, [2,3], 4}; info.TurnCount = 1; info.ActiveAgentIndex = 1; end
Environment step function.
function [nextObs, reward, isdone, info] = stepFcn(action, info) % STEPFUN specifies how the environment advances to the next state % given the actions from all the agents. % % - NEXTOBS is a 1xN cell array (N is the total number of agents). % - ACTION is a 1xP cell array (P is the number of active agents). % - REWARD is a 1xN numeric array. % - ISDONE is a logical or numeric scalar. % - INFO is a structure with the field: % - ActiveAgentIndex: a scalar or vector of agent indices that % are active in the current step. The value can be modified to % control the execution of agents in the next step. % - Other fields of any MATLAB data type can be used to pass % information from one step to the next. % For this example, just return to each agent a random observation nextObs = { rand([4 1]), ... rand([2 1]), ... rand([5 1]), ... rand([3 1]) }; % Return a random reward vector multiplied by the norm of the action % of the first or the current executing agents reward = rand(4,1)*norm(action{1}); % Return a false is-done value. isdone = false; % Extract the execution order and turn count ord = info.ExecutionOrder; tc = info.TurnCount; % Reset turn count to zero when it reaches 3 if mod(tc, numel(ord)) == 0 tc = 0; end % Set ActiveAgentIndex and TurnCount fields info.ActiveAgentIndex = ord{tc+1}; info.TurnCount = tc+1; % Set the EnvironmentState field info.EnvironmentState = nextObs; end
Version History
Introduced in R2023b
See Also
Functions
rlPredefinedEnv
|rlCreateEnvTemplate
|validateEnvironment
|rlSimulinkEnv
|getObservationInfo
|getActionInfo
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)