RL toolbox train on continuous simulation with delay between episodes

12 views (last 30 days)
I have a Simulink simulation of an environment that runs continuously and cannot be interrupted. I want to train a PPO agent using this simulation. Episodes would have to start and run for a set number of time steps (receiving observations and rewards and sending out actions) and then end without the environment simulation stopping. After an episode ends I would like to have a delay before the next episode starts and during this delay I want to apply a pre-defined control that stabilizes the simulation.
Is there any way learning can be set up in this way? If delays are not possible I'm still very interested in learning on a continuous simulation (continuous in terms of ongoing for a long time, not in terms of action and observation space)
Thanks for any help on this

Answers (2)

Emmanouil Tzorakoleftherakis
Hi Joe,
I believe the setup you mention may be possible but it will require some work.Essentially, you need to set up training to have a single very long episode and put the RL Agent block in an enabled subsystem. Once some condition A is met, the enabled subsystem will be OFF and input to the system will be directed by some other source until A does not hold (in which case the RL Agent will become active again). The downside is that you would not be able to view the evolution of episode rewards since you only have 1 episode.
I would be curious to find out more though since, basically what you are describing, i.e., having one very long episode and periodically introducing a delay during which you stabilize the simulation is more naturally implemented by having distinct episodes and resetting the states inbetween. What is the application? Any reason you want to set up the problem that way?
  5 Comments
Bipin Paudel
Bipin Paudel on 19 Apr 2023
Hi @Emmanouil Tzorakoleftherakis, I'm currently facing a similar issue. I have a model based on Simscape that requires some amount of time to attain a steady state. My objective is to train an agent based on reinforcement learning when the system is in a steady state. Could you suggest a suitable method to pause the training for a period and resume it once the model reaches the desired state in each episodes? I don't wan't to train my RL agent before the model is in steady state.
I would greatly appreciate any assistance.
Emmanouil Tzorakoleftherakis
I think your question is a bit different, so ideally this would be a new thread so that the answer is more discoverable. In any case, since R2022a, you can place the RL Agent block inside conditionally executed subsystems. So you can initiate training whenever it makes sense:

Sign in to comment.


Bipin Paudel
Bipin Paudel on 24 Apr 2023
Thanks for this. I found a subsystem called 'Enabled Subsystems.' As a control input to this subsystem, I guess I can use a step block that outputs a value of 1 after some time (when I want my agent to start training) to this subsystem, else 0.
This should solve a problem right?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!