How can I simulate Direct ADP?

I want to simulate this article ( On-Line Learning Control by Association and Reinforcement ) but I have a problem in obtaining optimum weight for critic neural network. The critic neural network error in this article is [ e_c = J(t) - (J(t-1) - r(t)) ] , and at the begining the critic weights are selected randomly. My question is that, at the begining we dont have any J(t-1) and also we know that J(t) and r(t) are positive functions, so if we consider J(t-1) = 0, then J(t) will converg to -r(t) and become a negative number that is false.

Answers (0)

Categories

Find more on Reinforcement Learning Toolbox in Help Center and File Exchange

Asked:

on 10 Sep 2021

Edited:

on 10 Sep 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!