Looks like training was not successful. There could be many things at fault here - some suggestions:
1) Make sure you are randomizing the target locations at the beginning of each episode. It would help if you add visualization to actually verify targets move/debug the agent's behavior during training
2) The agent may not have enough information available to make decisions. Make sure the observations provide enough info to the agent
3) What does the episode manager plot look like when training stops? You may need to train the agent for more time
4) Why are you using a dropout layer? Unless your observations are images, this layer islikely not required (at least I don't think I have seen it in any shipping examples in Reinforcement Learning Toolbox). So your neural network architecture may also have something to do with this behavior.
0 Comments
Sign in to comment.