Custom Action Space DDPG Reinforcement Learning Agent

Question

Hans-Joachim Steinort on 4 Mar 2020

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/508917-custom-action-space-ddpg-reinforcement-learning-agent

Edited: Hans-Joachim Steinort on 12 Mar 2020

Accepted Answer: Emmanouil Tzorakoleftherakis

After running into a challenge with my reinforcement learning agent I hope you can help me with at least a little hint.

My DDPG agent has a continuous action space which works totally fine. Unfortunately it cannot get transfered to a real-life system this way. Trying to find an optimal value for the actions in different situations the agent should avoid certain combinations.

The action space is defined like:

actionInfo = rlNumericSpec([4 1], ...
                           'LowerLimit', [0; 0; 0; 0], ...
                           'UpperLimit', [maxA1; maxA2; maxA3; maxA4]);

But due to restrictions in the real-life system it should more be like

A1 = (0 || [minA1; maxA1])

to avoid actions in the range

A1 = ]0; minA1[

Is there any possibility to define my action space this way?

Note:

I have already tried to route the agent to avoid actions in this range by penalizing it via the reward but it doesn't seem to work out. Instead of steadily improving over the episodes it now tends more to a sideways movement after reaching a certain (not desirable) level.

Thanks in advance!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 4 Mar 2020

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/508917-custom-action-space-ddpg-reinforcement-learning-agent#answer_418493

To my knowledge, you cannot implement a custom action space with rlNumericSpec, but what you could possibly do (since adding penalty terms in the reward does not help), is to add some additional logic to manipulate the agent's actions/output of RL agent block. Your policy would then be the combined neural network+new logic. Just an idea

3 Comments
Show 1 older commentHide 1 older comment

Emmanouil Tzorakoleftherakis on 5 Mar 2020

This will change the data stored in experience buffers/mini batches during training, as well as logged data when you perform simulations after training. For the latter, you can just choose to log the respective signal after the action transformation. For the former, I don't think it will cause issues. You can think of the additional logic as an extra layer in your neural network that only does algebraic manipulations (like a scaling layer for instance). There are no weights/parameters to be learned.

The three candidate places you mentioned should lead to the same results. Just for visualization purposes (I am assuming you use Simulink since you mentioned 'AgentWrapper'), I would add the logic right after the agent block, and put both under a separate subsystem so that you can treat the agent+logic as your new decision making system.

Hans-Joachim Steinort on 6 Mar 2020

Edited: Hans-Joachim Steinort on 12 Mar 2020

Thank you for your explanation!

This actually helped me to wrap my head around this issue. I will definitively try out your suggestion with the additional logic and will come back to you afterwards.

EDIT:

It worked the way you suggested, thanks a lot!

Sign in to comment.

Custom Action Space DDPG Reinforcement Learning Agent

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Custom Action Space DDPG Reinforcement Learning Agent

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment