Action Value can't be constained

Question

0 votes

I am a beginner to RL and now trying to use policy gradient as my agent. Here is somthing weird I found when I try to output the action value in a certain range.

In the Create Continuous Stochastic Actor from Deep Neural Network of this link:

https://uk.mathworks.com/help/reinforcement-learning/ref/rlstochasticactorrepresentation.html

The action value limit is set first in rlNumericSpec(), but the constrain here seems to have no effect on the actual actor output. If I change the lower limit to 0, it would still yield negative value.

My question is, to actually have an action output within range, do I need to achieve this via the neural network construction. Say I want a range of 0 to 5, how should I modify the network then?

BTW, why the output elements of the neural network should be two times the actual action output? what's happening inside rlStochasticActorRepresentation()?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Asvin Kumar on 6 Aug 2020

0 votes

For your first question:

Have a look at the discussion here. https://www.mathworks.com/matlabcentral/answers/515602-incorrect-tanhlayer-output-in-rl-agent#answer_425717

In short, it might be because of the noise added to the predicted action. If I'm not wrong, you should be able to modify the properties of the noise in such a way that it doesn't affect your range.

For your second question:

The documentation for rlStochasticActorRepresentation says that the network output layer must have twice as many elements as the number of dimensions of the continuous action space and that they represent all the mean values followed by all the variances (which must be non-negative) of the Gaussian distributions for the dimensions of the action space.

The reason for the mean and variance is the nature of stochastic actors. From the description of rlStochasticActorRepresentation, a stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. This random action is sampled from the Gaussian distribution described by the mean and variance.

4 Comments
Show 2 older comments Hide 2 older comments

Zhengyang Chen on 15 Aug 2020

Hi, I think I kinda understand for the first question, why there still has negative outputs. It is not useful to simply control all the output of the neural network to be over zero, cuz the action is finally decided by the mean and variance, which are the NN output. So basically if we have a large variance, the action value can be more likely to locate on the far end on each side of the normal distribution. And that means a negative value is possible to get. Am I correct?

Asvin Kumar on 16 Aug 2020

Perfect. Your explanation works better than mine.

Sign in to comment.

Action Value can't be constained

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

4 Comments
Show 2 older comments Hide 2 older comments

Categories

Tags

Community Treasure Hunt

Action Value can't be constained

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

4 Comments Show 2 older comments Hide 2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments