Th "Rocket Lander" example does not converge with the stated hyperparameters. Someone was helpful enough to give me the following values:
learning rate = 1e-4
clip factor = 0.1
mini-batch size = 128
Although these values work better, the algorithm still does not converge. After about 14,000 episodes there are many successful landings, but they are interspersed with violent crash landings. Anybody at MathWorks or otherwise have any suggestions? Thank you.
Averill M. LAW