Can observation and reward be the same signal in a RL system?

Question

Jize Liu on 25 Apr 2022

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1704200-can-observation-and-reward-be-the-same-signal-in-a-rl-system

Commented: Jize Liu on 6 Apr 2024

When I tried to train a RL system, I created a simulink model, where there is only one action and one observation, which is the reward. Then I encountered an error named" containing algebraic loop" when I tried to train it. So I wonder if the way I define observation and reward caused this problem.

The reason why I define reward and observation as the same signal is they act the same role in this system, I want the agent get only this signal from the environment, so I just define one observation representing both observation and reward for avoiding redundance.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Poorna on 31 Mar 2024

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1704200-can-observation-and-reward-be-the-same-signal-in-a-rl-system#answer_1433946

Hi Jize,

I see that you want to use the same signal both as an observation and reward in your reinforcement learning setup. It is to be noted that observation and reward do not occur at the same time.

In a reinforcement learning setting you first make an observation i.e, the current state of the system, and then pick an action and execute it. Your system will then move to a new state. The reward that you get at the end of this transition is a function of your initial state, the action and the resultant next state. When you say you want to use the same signal as reward and observation. It means that the reward you get at time step 't', will be the observation at time step 't+1'.

The algebraic loop error you're encountering arises from attempting to use the reward at time step (t) directly as the observation at the same time step (t), which creates a paradoxical situation. This is because the system is being asked to observe a signal that has not yet been generated, resulting in a logical inconsistency.

So, you should try adding an "unit delay" block when you pass the reward as observation to the system. By doing this you are essentially sending the reward of previous transition as obsevation to the current transition.

To know more about the "unit delay" block, refer to the following documentation:

https://www.mathworks.com/help/simulink/slref/unitdelay.html

Hope this Helps!.

1 Comment
Show -1 older commentsHide -1 older comments

Jize Liu on 6 Apr 2024

Thank you for your reply. This should help. I have one point want to confirm: So in one cycle(t), the system starts from receiving an observation and ends with a reward, and in the next cycle(t+1), the new observation, which could be the reward from the last cycle, will be input to the system and start a new period. Is this so?

Sign in to comment.

Can observation and reward be the same signal in a RL system?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Can observation and reward be the same signal in a RL system?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments