Action Value can't be constained

4 views (last 30 days)
Zhengyang Chen
Zhengyang Chen on 3 Aug 2020
Commented: Asvin Kumar on 16 Aug 2020
I am a beginner to RL and now trying to use policy gradient as my agent. Here is somthing weird I found when I try to output the action value in a certain range.
In the Create Continuous Stochastic Actor from Deep Neural Network of this link:
The action value limit is set first in rlNumericSpec(), but the constrain here seems to have no effect on the actual actor output. If I change the lower limit to 0, it would still yield negative value.
My question is, to actually have an action output within range, do I need to achieve this via the neural network construction. Say I want a range of 0 to 5, how should I modify the network then?
BTW, why the output elements of the neural network should be two times the actual action output? what's happening inside rlStochasticActorRepresentation()?

Answers (1)

Asvin Kumar
Asvin Kumar on 6 Aug 2020
For your first question:
In short, it might be because of the noise added to the predicted action. If I'm not wrong, you should be able to modify the properties of the noise in such a way that it doesn't affect your range.
For your second question:
The documentation for rlStochasticActorRepresentation says that the network output layer must have twice as many elements as the number of dimensions of the continuous action space and that they represent all the mean values followed by all the variances (which must be non-negative) of the Gaussian distributions for the dimensions of the action space.
The reason for the mean and variance is the nature of stochastic actors. From the description of rlStochasticActorRepresentation, a stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. This random action is sampled from the Gaussian distribution described by the mean and variance.
  4 Comments
Zhengyang Chen
Zhengyang Chen on 15 Aug 2020
Hi, I think I kinda understand for the first question, why there still has negative outputs. It is not useful to simply control all the output of the neural network to be over zero, cuz the action is finally decided by the mean and variance, which are the NN output. So basically if we have a large variance, the action value can be more likely to locate on the far end on each side of the normal distribution. And that means a negative value is possible to get. Am I correct?
Asvin Kumar
Asvin Kumar on 16 Aug 2020
Perfect. Your explanation works better than mine.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!