I'm training a DQN agent from the new Reinforcement Learning toolbox. During training, the critic network generates long-term reward estimates (Q0) throughout each episode - these are displayed in green on the training progress plot. In blue and red are the episode and running average reward, respectively. As you can see, the actual rewards average around -1000, but the first few estimates were orders of magnitude greater, and so they permanently skew the y-axis. Therefore we cannot discern the progress of actual rewards in training.
It seems I either need to bound the critic's estimate, or set limits on the Reinforcement Learning Episode Manager's y-axis. I haven't found a way to do either.
0 Comments
Sign in to comment.