Reinforcement Learning Sample Time

Question

Braydon Westmoreland on 27 Jun 2020

2
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time

Commented: Kai Tybussek on 15 Jul 2020

Sorry if this is a dumb question, but I am not sure how to configure the sample time on my reinforcement learning agent so that it will properly interact with the Simscape Electrical environment I've created. My goal is for the RL agent to output an action every 1 seconds and then that action is used to update the MOSFET gate voltages in the environment. The environment then uses the new gate voltages to perform a 100 micro second pulse where the MOSFET's drain-source currents are measured midway through the pulse. The measured current is used to determine the agent's rewards in addition to determining when an episode is over. An episode is over when the agent has (mostly) balanced the 4 measured currents within a defined threshold.

My confusion comes when trying to setup the timing in the environment, particularly the timing of the outputs that go to the RL agent. The agent requires the environment to output every Ts (1 sec), but I need an additional delay of roughly 100 micro seconds in order for the pulse and subsequent current measurements to take place.

I believe I have a fundamental misunderstanding of the way sample time works here. Any help is greatly appreciated. Thank you

Additional note: there is a bug where the agent is outputting the same sequence of actions every episode, regardless of the previous observation.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 2 Jul 2020

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time#answer_460051

Hi Braydon,

The agent sample time effectively determines how often the agent will output a decision/action. Think of it as the equivalent of your control application time. If you need new actions every 100us, that should be your sample time. If new actions every 1 second are enough, then the environment could consume the same action for 10 consecutive time steps (assuming 100us sample time for the environment) until a new action is available 1 second later.

If you want to add a delay in the observation inputs, you can always use a delay block.

This may not be exactly the same application, but this video that shows how to use RL for motor control by setting PWM references may be helpful.

1 Comment
Show -1 older commentsHide -1 older comments

Kai Tybussek on 15 Jul 2020

what do i have to do if i want the agent to perform one action, see if "isDone=1" and if not reset to initial observation and do another action ? My Sample time in this case is 1 and my steps per episode need to be 1 too?

Sign in to comment.

Reinforcement Learning Sample Time

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Reinforcement Learning Sample Time

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments