To choose an action, is it correct to compute the value of successor state or do we need to compute value of states in the entire path till end state?

Question

Gowri A.S. on 6 Jul 2020

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/560102-to-choose-an-action-is-it-correct-to-compute-the-value-of-successor-state-or-do-we-need-to-compute

Commented: Gowri A.S. on 7 Jul 2020

Accepted Answer: Emmanouil Tzorakoleftherakis

While selecting an action , that action is chosen whose Q(s,a) is maximum. Q(s,a) is sum of reward and discounted value of next state.

From a state, when I proceed computing the best action, do I need to continue computing (iterating) the value of successor states over a path till the end state

(or)

is it enough to compute the value of immediate successor state alone and decide the action among the state that yields maximum value.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 6 Jul 2020

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/560102-to-choose-an-action-is-it-correct-to-compute-the-value-of-successor-state-or-do-we-need-to-compute#answer_461726

Hi Gowri,

Using the Q value for a state+action pair encodes all the information till 'the end of the path' weighted by a discount factor (assuming you are following the same policy).

So assuming you have a critic tha approximates the Q function relatively well, you shouldn't need to check Q values of successor states.

3 Comments
Show 1 older commentHide 1 older comment

Emmanouil Tzorakoleftherakis on 6 Jul 2020

If the approximation of the Q function is relatively accurate (whether that's through a table, neural network, polynomial, other), then yes, looking at the Q value of the current state/action pair should be sufficient when you are trying to 'extract' the policy.

In fact, if you look at vanilla DQN, even during training the Bellman equation looks one step ahead. I am not saying than n-step learning is not an option, but you certainly don't need all subsequent Q values.

Gowri A.S. on 7 Jul 2020

Thank you v.much Sir. I got it.

Sign in to comment.

To choose an action, is it correct to compute the value of successor state or do we need to compute value of states in the entire path till end state?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

To choose an action, is it correct to compute the value of successor state or do we need to compute value of states in the entire path till end state?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment