MATLAB Answers

Custom Action Space DDPG Reinforcement Learning Agent

40 views (last 30 days)
After running into a challenge with my reinforcement learning agent I hope you can help me with at least a little hint.
My DDPG agent has a continuous action space which works totally fine. Unfortunately it cannot get transfered to a real-life system this way. Trying to find an optimal value for the actions in different situations the agent should avoid certain combinations.
The action space is defined like:
actionInfo = rlNumericSpec([4 1], ...
'LowerLimit', [0; 0; 0; 0], ...
'UpperLimit', [maxA1; maxA2; maxA3; maxA4]);
But due to restrictions in the real-life system it should more be like
A1 = (0 || [minA1; maxA1])
to avoid actions in the range
A1 = ]0; minA1[
Is there any possibility to define my action space this way?
I have already tried to route the agent to avoid actions in this range by penalizing it via the reward but it doesn't seem to work out. Instead of steadily improving over the episodes it now tends more to a sideways movement after reaching a certain (not desirable) level.
Thanks in advance!


Sign in to comment.

Accepted Answer

Emmanouil Tzorakoleftherakis
To my knowledge, you cannot implement a custom action space with rlNumericSpec, but what you could possibly do (since adding penalty terms in the reward does not help), is to add some additional logic to manipulate the agent's actions/output of RL agent block. Your policy would then be the combined neural network+new logic. Just an idea


Hans-Joachim Steinort
Hans-Joachim Steinort on 5 Mar 2020
Thank you for your suggestion.
This seems feasable but where would I add this new logic? There are multiple options:
  • outside of agent and environemtn (between the two)
  • after entering the environment
  • inside the agent (right before passing the action to the output)
Also it raises the concern: doesn't this approach corrupt my (s, a, r, s') tuples in the memory if I change the action after the actor selected one? Or will it be memorized correctly if the new logic is added within the AgentWrapper?
Emmanouil Tzorakoleftherakis
This will change the data stored in experience buffers/mini batches during training, as well as logged data when you perform simulations after training. For the latter, you can just choose to log the respective signal after the action transformation. For the former, I don't think it will cause issues. You can think of the additional logic as an extra layer in your neural network that only does algebraic manipulations (like a scaling layer for instance). There are no weights/parameters to be learned.
The three candidate places you mentioned should lead to the same results. Just for visualization purposes (I am assuming you use Simulink since you mentioned 'AgentWrapper'), I would add the logic right after the agent block, and put both under a separate subsystem so that you can treat the agent+logic as your new decision making system.
Hans-Joachim Steinort
Hans-Joachim Steinort on 6 Mar 2020
Thank you for your explanation!
This actually helped me to wrap my head around this issue. I will definitively try out your suggestion with the additional logic and will come back to you afterwards.
It worked the way you suggested, thanks a lot!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!