Measures to improve computation time with reinforcement learning block in Simulink

Question

Enrico Anderlini on 13 Dec 2019

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/496460-measures-to-improve-computation-time-with-reinforcement-learning-block-in-simulink

Edited: Emmanouil Tzorakoleftherakis on 27 Jan 2020

I am using the reinforcement learning toolbox to run control tasks, in particular using the DDPG agent. Unfortunately, each episode lasts 100 seconds with a 0.01 s time step (the control time step is 0.1 s, i.e. the RL control block is called that often). The computation time is unfortunately unamangeably high.

I have tried to reduce the training of the actor and critic neural networks to every 5 episodes by using a periodic TargetUpdateMethod and changing the TargetUpdateFrequency. However, by doing a deeper analysis, it is clear that it the computational time taken by each episode, which is too high. So, this is pointing the culpript to the RL Simulink block.

The way I see it, the block should run the neural networks (which is a matrix multiplication) and store the additional experience point in the memory (so some more matrix calculations, if the memory is full). So, this is not fully explaining the large overhead to me.

My code is running (more) efficiently on Python, so it is clear I am not fully exploiting the MATLAB/C++ implementation.

Any advice on how I could try to improve the computational efficiency?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 27 Jan 2020

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/496460-measures-to-improve-computation-time-with-reinforcement-learning-block-in-simulink#answer_412231

Edited: Emmanouil Tzorakoleftherakis on 27 Jan 2020

Hi Enrico,

Changing the values of TargetUpdateMethod and TargetUpdateFrequency will not change how often training happens, but only how often the actor and critic copies are synced (remember DDPG is an off-policy method, so it keeps two copies of the actor and the critic).

If you look at the algorithm description here, you will see that learning happens at steps 6 and 7, and these happen at each time step (0.1s in your example), which is why you see this slowdown. So the quick things to try are 1) increase sample time, 2) reduce episode duration and 3) reduce size of mini-batch.

One additional thing to try is to parallelize training. You can use Parallel Computing Toolbox for that, and to set this up, you pretty much need to set a flag in training options (see e.g. here).

We are also working on adding more training algorithms for continuous action spaces that are more sample efficient, so I would check back when R2020a goes live.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Measures to improve computation time with reinforcement learning block in Simulink

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Measures to improve computation time with reinforcement learning block in Simulink

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments