Main Content

rlMultiAgentTrainingOptions

Options for training multiple reinforcement learning agents

Since R2022a

    Description

    Use an rlMultiAgentTrainingOptions object to specify training options for multiple agents. To train the agents using the specified options, pass this object to train.

    For more information on training agents, see Train Reinforcement Learning Agents.

    Creation

    Description

    trainOpts = rlMultiAgentTrainingOptions returns the default options for training multiple reinforcement learning agents. Use training options to specify parameters for the training session, such as the maximum number of episodes to train, criteria for stopping training, and criteria for saving agents. After configuring the options, use trainOpts as an input argument for train.

    trainOpts = rlMultiAgentTrainingOptions(Name,Value) creates a training option set and sets object Properties using one or more name-value pair arguments.

    example

    Properties

    expand all

    Agent grouping indices, specified as a cell array of positive integers or a cell array of integer arrays.

    For instance, consider a training scenario with 4 agents. You can group the agents in the following ways:

    • Allocate each agent in a separate group:

      trainOpts = rlMultiAgentTrainingOptions("AgentGroups","auto")

    • Specify four agent groups with one agent in each group:

      trainOpts = rlMultiAgentTrainingOptions("AgentGroups",{1,2,3,4})
    • Specify two agent groups with two agents each:

      trainOpts = rlMultiAgentTrainingOptions("AgentGroups",{[1,2],[3,4]})
    • Specify three agent groups:

      trainOpts = rlMultiAgentTrainingOptions("AgentGroups",{[1,4],2,3})

    AgentGroups and LearningStrategy must be used together to specify whether agent groups learn in a centralized manner or decentralized manner.

    Example: AgentGroups={1,2,[3,4]}

    Learning strategy for each agent group, specified as either "decentralized" or "centralized". In decentralized training, agents collect their own set of experiences during the episodes and learn independently from those experiences. In centralized training, the agents share the collected experiences and learn from them together.

    AgentGroups and LearningStrategy must be used together to specify whether agent groups learn in a centralized manner or decentralized manner. For example, you can use the following command to configure training for three agent groups with different learning strategies. The agents with indices [1,2] and [3,5] learn in a centralized manner, while agent 4 learns in a decentralized manner.

    trainOpts = rlMultiAgentTrainingOptions(...
         AgentGroups={[1,2],4,[3,5]},...
         LearningStrategy=["centralized","decentralized","centralized"] )

    Example: LearningStrategy="centralized"

    This property is read-only.

    Maximum number of episodes to train the agents, specified as a positive integer. Regardless of other criteria for termination, training terminates after MaxEpisodes.

    Example: MaxEpisodes=1000

    This property is read-only.

    Maximum number of environment steps to run per episode, specified as a positive integer. In general, you define episode termination conditions in the environment. This value is the maximum number of steps to run in the episode if other termination conditions are not met.

    Example: MaxStepsPerEpisode=1000

    Option to stop training when an error occurs during an episode, specified as "on" or "off". When this option is "off", errors are captured and returned in the SimulationInfo output of train, and training continues to the next episode.

    Example: StopOnError="off"

    Storage type for environment data, specified as "memory", "file", or "none". This option specifies the type of storage used for data generated during training or simulation by a Simulink® environment. Specifically, the software saves anything that appears as the output of a sim (Simulink) command.

    Note that this option does not affect (and is not affected by) any option to save agents during training specified within a training option object, or any data logged by a FileLogger or MonitorLogger object.

    The default value is "memory", indicating that data is stored in an internal memory variable. When you set this option to "file", data is stored to disk, in MAT-files in the directory specified by the SaveSimulationDirectory property, and using the MAT-file version specified by the SaveFileVersion property. When you set this option to "none", simulation data is not stored.

    You can use this option to prevent out-of-memory issues during training or simulation.

    Example: "none"

    Folder used to save environment data, specified as a string or character vector. The folder name can contain a full or relative path. When you set the SimulationStorageType property to "file", the software saves data generated during training or simulation by a Simulink environment in MAT-files in this folder, using the MAT-file version specified by the SaveFileVersion property. If the folder does not exist, the software creates it.

    Example: "envSimData"

    MAT-file version used to save environment data, specified as a string or character vector. When you set the SimulationStorageType property to "file", the software saves data generated by a Simulink environment in MAT-files in the version specified by SaveFileVersion, in the folder specified by the SaveSimulationDirectory property. For more information, see MAT-File Versions.

    Example: Version="-v7.3"

    Window length for averaging the scores, rewards, and number of steps for each agent, specified as a scalar or vector.

    If the training environment contains a single agent, specify ScoreAveragingWindowLength as a scalar.

    If the training environment is a multi-agent environment, specify a scalar to apply the same window length to all agents.

    To use a different window length for each agent, specify ScoreAveragingWindowLength as a vector. In this case, the order of the elements in the vector correspond to the order of the agents used during environment creation.

    For options expressed in terms of averages, ScoreAveragingWindowLength is the number of episodes included in the average. For instance, if StopTrainingCriteria is "AverageReward", and StopTrainingValue is 500 for a given agent, then for that agent, training terminates when the average reward over the number of episodes specified in ScoreAveragingWindowLength equals or exceeds 500. For the other agents, training continues until:

    • All agents reach their stop criteria.

    • The number of episodes reaches MaxEpisodes.

    • You stop training by clicking the Stop Training button in Reinforcement Learning Training Monitor or pressing Ctrl-C at the MATLAB® command line.

    Example: ScoreAveragingWindowLength=10

    Training termination condition, specified as one of the following strings:

    • "None" — Do not stop training until the number of episodes reaches MaxEpisodes.

    • "AverageSteps" — Stop training when the running average number of steps per episode equals or exceeds the critical value specified by the option StopTrainingValue. The average is computed using the window 'ScoreAveragingWindowLength'.

    • "AverageReward" — Stop training when the running average reward equals or exceeds the critical value.

    • "EpisodeReward" — Stop training when the reward in the current episode equals or exceeds the critical value.

    • "GlobalStepCount" — Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

    • "EpisodeCount" — Stop training when the number of training episodes equals or exceeds the critical value.

    • "EvaluationStatistic" — Stop training when the statistic returned by the evaluator object used with train (if any) equals or exceeds the specified value.

    • "Custom" — Stop training when the custom function specified in StopTrainingValue returns true.

    Example: StopTrainingCriteria="AverageReward"

    This property is read-only.

    Critical value of the training termination condition, specified as a scalar, vector or as a function name or handle.

    You can use a custom stop criteria by specifying StopTrainingValue as a function name or handle. Your function must have one input and one output, as shown in the following signature.

    flag = myTerminationFcn(trainingStats)

    Here, trainingStats is a structure that contains the following fields, all described in the trainStats output argument of train.

    • EpisodeIndex

    • EpisodeReward

    • EpisodeSteps

    • AverageReward

    • TotalAgentSteps

    • EpisodeQ0

    • SimulationInfo

    • EvaluationStatistics

    • TrainingOptions

    The training stops when the specified function returns true.

    When not using a custom termination criteria, the following indications apply.

    If the training environment contains a single agent, specify StopTrainingValue as a scalar. If the training environment is a multi-agent environment, specify a scalar to apply the same termination criterion to all agents. To use a different termination criterion for each agent, specify StopTrainingValue as a vector. In this case, the order of the elements in the vector corresponds to the order of the agents used during environment creation.

    For a given agent, training ends when the termination condition specified by the StopTrainingCriteria option equals or exceeds this value. For the other agents, the training continues until:

    • All agents reach their stop criteria.

    • The number of episodes reaches maxEpisodes.

    • You stop training by clicking the Stop Training button in Reinforcement Learning Training Monitor or pressing Ctrl-C at the MATLAB command line.

    For instance, if StopTrainingCriteria is "AverageReward" and StopTrainingValue is 100 for a given agent, then for that agent, training terminates when the average reward over the number of episodes specified in ScoreAveragingWindowLength equals or exceeds 100.

    Example: StopTrainingValue=100

    Condition for saving agents during training, specified as one of the following strings:

    • "None" — Do not save any agents during training.

    • "EpisodeReward" — Save all the agents when an agent reward in the current episode equals or exceeds the critical value specified in SaveTrainingValue.

    • "AverageSteps" — Save the agents when the running average number of steps per episode equals or exceeds the critical value specified by the option SaveTrainingValue. The average is computed using the window specified in ScoreAveragingWindowLength.

    • "AverageReward" — Save the agents when the running average reward over all episodes equals or exceeds the critical value.

    • "GlobalStepCount" — Save the agents when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

    • "EpisodeCount" — Save the agents when the number of training episodes equals or exceeds the critical value.

    • "EpisodeFrequency" — Save the agents with a period specified in SaveAgentValue. For example, if SaveAgentCriteria is specified as "EpisodeFrequency" andSaveAgentValue is specified as 10, the agent is saved after every ten episodes.

    • "EvaluationStatistic" — Save the agents when the statistic returned by the evaluator object used with train (if any) equals or exceeds the specified value.

    • "Custom" — Save the agents when the custom function specified in SaveAgentValue returns true.

    Set this option to store candidate agents that perform well according to the criteria you specify. When you set this option to a value other than "none", the software sets the SaveAgentValue option to 500. You can change that value to specify the condition for saving the agent.

    For instance, suppose you want to store for further testing any agent that yields an episode reward that equals or exceeds 100. To do so, set SaveAgentCriteria to "EpisodeReward" and set the SaveAgentValue option to 100. When an episode reward equals or exceeds 100, train saves the corresponding agent (or all the corresponding agents for a multiagent environment) in a MAT-file in the folder specified by the SaveAgentDirectory option. The MAT-file is called AgentK.mat (or AgentsK.mat for a multiagent environment), where K is the number of the corresponding episode. The agents are stored within that MAT-file as the saved_agent array. Note that the MAT-file also includes the variable savedAgentResult which contains the training result information up to the corresponding episode.

    Example: SaveAgentCriteria="EpisodeReward"

    This property is read-only.

    Critical value of the condition for saving agents, specified as a scalar, vector or as a function name or handle.

    You can use a custom save criteria by specifying SaveAgentValue as a function name or handle. Your function must have one input and one output, as shown in the following signature.

    flag = mySaveFcn(trainingStats)

    Here, trainingStats is a structure that contains the following fields, all described in the trainStats output argument of train.

    • EpisodeIndex

    • EpisodeReward

    • EpisodeSteps

    • AverageReward

    • TotalAgentSteps

    • EpisodeQ0

    • SimulationInfo

    • EvaluationStatistic

    • TrainingOptions

    The training stops when the specified function returns true.

    When not using a custom termination criteria, the following indications apply.

    If the training environment contains a single agent, specify SaveAgentValue as a scalar.

    If the training environment is a multi-agent environment, specify a scalar to apply the same saving criterion to each agent. To save the agents when one meets a particular criterion, specify SaveAgentValue as a vector. In this case, the order of the elements in the vector corresponds to the order of the agents used when creating the environment. When a criteria for saving an agent is met, all agents are saved in the same MAT-file.

    When you specify a condition for saving candidate agents using SaveAgentCriteria, the software sets this value to 500. Change the value to specify the condition for saving the agent. See the SaveAgentCriteria option for more details.

    Example: SaveAgentValue=100

    Folder name for saved agents, specified as a string or character vector. The folder name can contain a full or relative path. When an episode occurs in which the conditions specified by the SaveAgentCriteria and SaveAgentValue options are satisfied, the software saves the current agent in a MAT-file in this folder. If the folder does not exist, the training function creates it. When SaveAgentCriteria is "none", this option is ignored and no folder is created.

    Example: SaveAgentDirectory = pwd + "\run1\Agents"

    Option to display training progress at the command line, specified as the logical values false (0) or true (1). Set to true to write information from each training episode to the MATLAB command line during training.

    Example: Verbose=true

    Option to display training progress with Reinforcement Learning Training Monitor, specified as "training-progress" or "none". By default, calling train opens Reinforcement Learning Training Monitor, which graphically and numerically displays information about the training progress, such as the reward for each episode, average reward, number of episodes, and total number of steps. For more information, see train. To turn off this display, set this option to "none".

    Example: Plots="none"

    Object Functions

    trainTrain reinforcement learning agents within a specified environment

    Examples

    collapse all

    Create an options set for training 5 reinforcement learning agents in three different learning groups. Set the maximum number of episodes and the maximum number of steps per episode to 1000. Configure the options to stop training when the average reward equals or exceeds 480, and turn on both the command-line display and Reinforcement Learning Training Monitor for displaying training results. You can set the options using name-value pair arguments when you create the options set. Any options that you do not explicitly set have their default values.

    trainOpts = rlMultiAgentTrainingOptions(...
        AgentGroups={[1,2],3,[4,5]},...
        LearningStrategy= ...
            ["centralized","decentralized","centralized"],...
        MaxEpisodes=1000,...
        MaxStepsPerEpisode=1000,...
        StopTrainingCriteria="AverageReward",...
        StopTrainingValue=480,...
        Verbose=true,...
        Plots="training-progress")
    trainOpts = 
      rlMultiAgentTrainingOptions with properties:
    
                       AgentGroups: {[1 2]  [3]  [4 5]}
                  LearningStrategy: ["centralized"    "decentralized"    "centralized"]
                       MaxEpisodes: 1000
                MaxStepsPerEpisode: 1000
                       StopOnError: "on"
             SimulationStorageType: "memory"
           SaveSimulationDirectory: "savedSims"
                   SaveFileVersion: "-v7"
        ScoreAveragingWindowLength: 5
              StopTrainingCriteria: "AverageReward"
                 StopTrainingValue: 480
                 SaveAgentCriteria: "none"
                    SaveAgentValue: "none"
                SaveAgentDirectory: "savedAgents"
                           Verbose: 1
                             Plots: "training-progress"
    
    

    Alternatively, create a default options set and use dot notation to change some of the values.

    trainOpts = rlMultiAgentTrainingOptions;
    
    trainOpts.AgentGroups = {[1,2],3,[4,5]};
    trainOpts.LearningStrategy = ...
        ["centralized","decentralized","centralized"];
    trainOpts.MaxEpisodes = 1000;
    trainOpts.MaxStepsPerEpisode = 1000;
    trainOpts.StopTrainingCriteria = "AverageReward";
    trainOpts.StopTrainingValue = 480;
    trainOpts.Verbose = true;
    trainOpts.Plots = "training-progress";
    
    trainOpts
    trainOpts = 
      rlMultiAgentTrainingOptions with properties:
    
                       AgentGroups: {[1 2]  [3]  [4 5]}
                  LearningStrategy: ["centralized"    "decentralized"    "centralized"]
                       MaxEpisodes: 1000
                MaxStepsPerEpisode: 1000
                       StopOnError: "on"
             SimulationStorageType: "memory"
           SaveSimulationDirectory: "savedSims"
                   SaveFileVersion: "-v7"
        ScoreAveragingWindowLength: 5
              StopTrainingCriteria: "AverageReward"
                 StopTrainingValue: 480
                 SaveAgentCriteria: "none"
                    SaveAgentValue: "none"
                SaveAgentDirectory: "savedAgents"
                           Verbose: 1
                             Plots: "training-progress"
    
    

    You can now use trainOpts as an input argument to the train command.

    Create an options object for concurrently training three agents in the same environment.

    Set the maximum number of episodes and the maximum steps per episode to 1000. Configure the options to stop training the first agent when its average reward over 5 episodes equals or exceeds 400, the second agent when its average reward over 10 episodes equals or exceeds 500, and the third when its average reward over 15 episodes equals or exceeds 600. The order of agents is the one used during environment creation.

    Save the agents when the reward for the first agent in the current episode exceeds 100, or when the reward for the second agent exceeds 120, the reward for the third agent equals or exceeds 140.

    Turn on both the command-line display and Reinforcement Learning Training Monitor for displaying training results. You can set the options using name-value pair arguments when you create the options set. Any options that you do not explicitly set have their default values.

    trainOpts = rlMultiAgentTrainingOptions(...
        MaxEpisodes=1000,...
        MaxStepsPerEpisode=1000,...    
        ScoreAveragingWindowLength=[5 10 15],...        
        StopTrainingCriteria="AverageReward",...
        StopTrainingValue=[400 500 600],...    
        SaveAgentCriteria="EpisodeReward",...
        SaveAgentValue=[100 120 140],...    
        Verbose=true,...
        Plots="training-progress")
    trainOpts = 
      rlMultiAgentTrainingOptions with properties:
    
                       AgentGroups: "auto"
                  LearningStrategy: "decentralized"
                       MaxEpisodes: 1000
                MaxStepsPerEpisode: 1000
                       StopOnError: "on"
             SimulationStorageType: "memory"
           SaveSimulationDirectory: "savedSims"
                   SaveFileVersion: "-v7"
        ScoreAveragingWindowLength: [5 10 15]
              StopTrainingCriteria: "AverageReward"
                 StopTrainingValue: [400 500 600]
                 SaveAgentCriteria: "EpisodeReward"
                    SaveAgentValue: [100 120 140]
                SaveAgentDirectory: "savedAgents"
                           Verbose: 1
                             Plots: "training-progress"
    
    

    Alternatively, create a default options set and use dot notation to change some of the values.

    trainOpts = rlMultiAgentTrainingOptions;
    trainOpts.MaxEpisodes = 1000;
    trainOpts.MaxStepsPerEpisode = 1000;
    
    trainOpts.ScoreAveragingWindowLength = [5 10 15];
    
    trainOpts.StopTrainingCriteria = "AverageReward";
    trainOpts.StopTrainingValue = [400 500 600];
    
    trainOpts.SaveAgentCriteria = "EpisodeReward";
    trainOpts.SaveAgentValue = [100 120 140];
    
    trainOpts.Verbose = true;
    trainOpts.Plots = "training-progress";
    
    trainOpts
    trainOpts = 
      rlMultiAgentTrainingOptions with properties:
    
                       AgentGroups: "auto"
                  LearningStrategy: "decentralized"
                       MaxEpisodes: 1000
                MaxStepsPerEpisode: 1000
                       StopOnError: "on"
             SimulationStorageType: "memory"
           SaveSimulationDirectory: "savedSims"
                   SaveFileVersion: "-v7"
        ScoreAveragingWindowLength: [5 10 15]
              StopTrainingCriteria: "AverageReward"
                 StopTrainingValue: [400 500 600]
                 SaveAgentCriteria: "EpisodeReward"
                    SaveAgentValue: [100 120 140]
                SaveAgentDirectory: "savedAgents"
                           Verbose: 1
                             Plots: "training-progress"
    
    

    You can specify a scalar to apply the same criterion to all agents. For example, use a window length of 10 for all three agents.

    trainOpts.ScoreAveragingWindowLength = 10
    trainOpts = 
      rlMultiAgentTrainingOptions with properties:
    
                       AgentGroups: "auto"
                  LearningStrategy: "decentralized"
                       MaxEpisodes: 1000
                MaxStepsPerEpisode: 1000
                       StopOnError: "on"
             SimulationStorageType: "memory"
           SaveSimulationDirectory: "savedSims"
                   SaveFileVersion: "-v7"
        ScoreAveragingWindowLength: 10
              StopTrainingCriteria: "AverageReward"
                 StopTrainingValue: [400 500 600]
                 SaveAgentCriteria: "EpisodeReward"
                    SaveAgentValue: [100 120 140]
                SaveAgentDirectory: "savedAgents"
                           Verbose: 1
                             Plots: "training-progress"
    
    

    You can now use trainOpts as an input argument to the train command.

    Version History

    Introduced in R2022a