After running into a challenge with my reinforcement learning agent I hope you can help me with at least a little hint.
My DDPG agent has a continuous action space which works totally fine. Unfortunately it cannot get transfered to a real-life system this way. Trying to find an optimal value for the actions in different situations the agent should avoid certain combinations.
The action space is defined like:
actionInfo = rlNumericSpec([4 1], ...
'LowerLimit', [0; 0; 0; 0], ...
'UpperLimit', [maxA1; maxA2; maxA3; maxA4]);
But due to restrictions in the real-life system it should more be like
A1 = (0 || [minA1; maxA1])
to avoid actions in the range
Is there any possibility to define my action space this way?
I have already tried to route the agent to avoid actions in this range by penalizing it via the reward but it doesn't seem to work out. Instead of steadily improving over the episodes it now tends more to a sideways movement after reaching a certain (not desirable) level.
Thanks in advance!