Enforce action space constraints within the environment

12 views (last 30 days)
Hi,
My agent is training!!! But it's pretty much 0 reward every episode right now. I think it might be due to this:
contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.
How can I do this?
Also, is there a way to view the logged signals as the agent is training?
Thanks!
  1 Comment
John Doe
John Doe on 24 Feb 2021
There's something odd going on. It's not 0 reward, but it's not growing. I do have that first action method i said implemented in the other question (so for 4 of the continuous actions, it only chooses the first action) and for 1 action it's used every time step. I guess i need to check the logged signals to really determine what's going on. I'm too excited to make it work on the first or second try lol

Sign in to comment.

Accepted Answer

Emmanouil Tzorakoleftherakis
If the environment is in Simulink, you can setup scopes and observe what's happening during training. If the environment is in MATLAB, you need to do some extra work and plot things yourself.
For your contraints question, which agent are you using? Some agents are stochastic and some like DDPG add noise for exploration on top of the action output. To be certain, you can use a saturation block in Simulink or an if statement to clip the action as needed in MATLAB.
  28 Comments
John Doe
John Doe on 2 Mar 2021
Edited: John Doe on 2 Mar 2021
How can I do the scaling of the inputs to the network? That seems like the best way forward.
The environment is already constraining the actions, but the training is extremely sample inefficient and basically bouncing across the upper and lower limits of the actions for hundreds of episodes.
Emmanouil Tzorakoleftherakis
multiply the observations inside the 'step' function with a number that makes sense

Sign in to comment.

More Answers (1)

John Doe
John Doe on 17 Mar 2021
Edited: John Doe on 17 Mar 2021
Hi,
I feel like i'm really close to getting this. I haven't gotten a successful run yet. For thousands of episodes, the agent continues to use actions way out of the limits. I've tried adding the min/max thing for forcing them in the environment. Do you have any tips on how I can make it converge to stay within the limits? I even tried changing the rewards to be equivalent to be close to the limits.
I'm wondering whether this is perhaps a known issue that is on the roadmap to make the agent pick actions within spec limits for the continuous agent?
  5 Comments
John Doe
John Doe on 18 Mar 2021
Here's an example training. I gave it a negative reward for going outside the bounds of the action. This demonstrates how far outside the range the actor is picking. This same thing occurs for more episodes (5000) , although I don't have a screenshot for that. Surely there must be something I"m doing wrong? How can I make this converge?
John Doe
John Doe on 25 Mar 2021
I had a bug where I was using normalized values instead of the real values! I was able to solve the environment after that after changing the action to discrete! THanks for all your help and this wonderful toolbox!

Sign in to comment.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!