Upon attempting to train the path following control example in MATLAB, the training process generated the behviour shown in the picture.
- The steering angle is constantly fluctuating.
- The acceleration is also constantly flucutating.
- The reward convergence is very noisy and seems to jump between a high reward and low reward.
The example from here shows that it should have converged already and the actions should be smooth.
What could be causing this issue? This also happened for other projects I used. One method I used was to penalise the fluctuation in the reward function using this term inspired by a paper published by Wang et. al:
10*[ (d/dt(current_action) * d/dt(previous_action) < 0]
Please let me know how to avoid this problem. Thank you very much!