How can I simulate Direct ADP?
Show older comments
I want to simulate this article ( On-Line Learning Control by Association and Reinforcement ) but I have a problem in obtaining optimum weight for critic neural network. The critic neural network error in this article is [ e_c = J(t) - (J(t-1) - r(t)) ] , and at the begining the critic weights are selected randomly. My question is that, at the begining we dont have any J(t-1) and also we know that J(t) and r(t) are positive functions, so if we consider J(t-1) = 0, then J(t) will converg to -r(t) and become a negative number that is false.

Answers (0)
Categories
Find more on Reinforcement Learning Toolbox in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!