SAC RL agent does not explore properly (rlSACAgent)
Show older comments
Hi,
I'm trying to create a SAC RL agent. The agent can set 8 separate continuous actions with the same upper and lower bound (-10 and 10).
During training I observe that the actions chosen are (almost!) always one of the two bounds. So they often fluctuate between the minimum or the maximum. Sporadically another value is chosen for one of the actions.
I've found a similar question HERE, but the answer given did not solve the issue. (The range of the action space for all actions is already the same, and EntropyWeight did not change anything). Besides, I've tried to scale the reward, such as suggested in this article.
Are there any other methods for solving such problem? Or could it be that the must have some patience, and train the agent for more episodes, such that the problem is solved by itself?
Thanks in advance for any reply.
Kind regards,
3 Comments
Emmanouil Tzorakoleftherakis
on 24 Jun 2021
Can you share the actor architecture? This most likely has to do with that
Willemijn Remmerswaal
on 24 Jun 2021
Touleen Ibrahim
on 2 Apr 2024
Hi, I see the question is posted long time ago but I have faced the same problem and found the root cause and I would to share it, hopping it will help others.
The input consists of two types or more of data, normalization of the componant should be considered. Otherwise, the output of the actor neural network will be biased to the larger componant values.
BR
Answers (0)
Categories
Find more on Agents in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

