Main Content


Set up reinforcement learning environment to run multiple simulations


    When you define a custom training loop for reinforcement learning, you can simulate an agent or policy against an environment using the runEpisode function. Use the setup function to configure the environment for running simulations using multiple calls to runEpisode.

    setup(env) sets up the specified reinforcement learning environment for running multiple simulations using runEpisode.


    setup(env,Name=Value) specifies nondefault configuration options using one or more name-value pair arguments.


    collapse all

    Create a reinforcement learning environment and extract its observation and action specifications.

    env = rlPredefinedEnv("CartPole-Discrete");
    obsInfo = getObservationInfo(env);
    actInfo = getActionInfo(env);

    Create a Q-value function approximator.

    actorNetwork = [...
    actorNetwork = dlnetwork(actorNetwork);
    actor = rlDiscreteCategoricalActor(actorNetwork,obsInfo,actInfo);

    Create a policy object using the function approximator.

    policy = rlStochasticActorPolicy(actor);

    Create an experience buffer.

    buffer = rlReplayMemory(obsInfo,actInfo);

    Set up the environment for running multiple simulations. For this example, configure the training to log any errors rather than send them to the command window.


    Simulate multiple episodes using the environment and policy. After each episode, append the experiences to the buffer. For this example, run 100 episodes.

    for i=1:100
        output = runEpisode(env,policy,MaxSteps=300);

    Cleanup the environment.


    Sample a mini-batch of experiences from the buffer. For this example, sample 10 experiences.

    batch = sample(buffer,10);

    You can then learn from the sampled experiences and update the policy and actor.

    Input Arguments

    collapse all

    Reinforcement learning environment, specified as one of the following objects.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: StopOnError="on"

    Option to stop an episode when an error occurs, specified as one of the following:

    • "on" — Stop the episode when an error occurs and generate an error message in the MATLAB® command window.

    • "off" — Log errors in the SimulationInfo output of runEpisode.

    Option for using parallel simulations, specified as a logical value. Using parallel computing allows the usage of multiple cores, processors, computer clusters, or cloud resources to speed up simulation.

    When you set UseParallel to true, the output of a subsequent call to runEpisode is an rl.env.Future object, which supports deferred evaluation of the simulation.

    Function to run on the each worker before running an episode, specified as a handle to a function with no input arguments. Use this function to perform any preprocessing required before running an episode.

    Function to run on each worker when cleaning up the environment, specified as a handle to a function with no input arguments. Use this function to clean up the workspace or perform other processing after calling runEpisode.

    Option to send model and workspace variables to parallel workers, specified as "on" or "off". When the option is "on", the client sends variables used in models and defined in the base MATLAB workspace to the workers.

    Additional files to attach to the parallel pool before running an episode, specified as a string or string array.

    Worker random seeds, specified as one of the following:

    • -1 — Set the random seed of each worker to the worker ID.

    • Vector with length equal to the number of workers — Specify the random seed for each worker.

    Version History

    Introduced in R2022a