Reinforcement Learning Sample Time

33 views (last 30 days)
Braydon Westmoreland
Braydon Westmoreland on 27 Jun 2020
Commented: Kai Tybussek on 15 Jul 2020
Sorry if this is a dumb question, but I am not sure how to configure the sample time on my reinforcement learning agent so that it will properly interact with the Simscape Electrical environment I've created. My goal is for the RL agent to output an action every 1 seconds and then that action is used to update the MOSFET gate voltages in the environment. The environment then uses the new gate voltages to perform a 100 micro second pulse where the MOSFET's drain-source currents are measured midway through the pulse. The measured current is used to determine the agent's rewards in addition to determining when an episode is over. An episode is over when the agent has (mostly) balanced the 4 measured currents within a defined threshold.
My confusion comes when trying to setup the timing in the environment, particularly the timing of the outputs that go to the RL agent. The agent requires the environment to output every Ts (1 sec), but I need an additional delay of roughly 100 micro seconds in order for the pulse and subsequent current measurements to take place.
I believe I have a fundamental misunderstanding of the way sample time works here. Any help is greatly appreciated. Thank you
Additional note: there is a bug where the agent is outputting the same sequence of actions every episode, regardless of the previous observation.

Answers (1)

Emmanouil Tzorakoleftherakis
Hi Braydon,
The agent sample time effectively determines how often the agent will output a decision/action. Think of it as the equivalent of your control application time. If you need new actions every 100us, that should be your sample time. If new actions every 1 second are enough, then the environment could consume the same action for 10 consecutive time steps (assuming 100us sample time for the environment) until a new action is available 1 second later.
If you want to add a delay in the observation inputs, you can always use a delay block.
This may not be exactly the same application, but this video that shows how to use RL for motor control by setting PWM references may be helpful.
  1 Comment
Kai Tybussek
Kai Tybussek on 15 Jul 2020
what do i have to do if i want the agent to perform one action, see if "isDone=1" and if not reset to initial observation and do another action ? My Sample time in this case is 1 and my steps per episode need to be 1 too?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!