SAC RL agent does not explore properly (rlSACAgent)

Hi,
I'm trying to create a SAC RL agent. The agent can set 8 separate continuous actions with the same upper and lower bound (-10 and 10).
During training I observe that the actions chosen are (almost!) always one of the two bounds. So they often fluctuate between the minimum or the maximum. Sporadically another value is chosen for one of the actions.
I've found a similar question HERE, but the answer given did not solve the issue. (The range of the action space for all actions is already the same, and EntropyWeight did not change anything). Besides, I've tried to scale the reward, such as suggested in this article.
Are there any other methods for solving such problem? Or could it be that the must have some patience, and train the agent for more episodes, such that the problem is solved by itself?
Thanks in advance for any reply.
Kind regards,

3 Comments

Can you share the actor architecture? This most likely has to do with that
Hi Emmanouil,
When I'm using the initialized actor network the actor network looks as follows:
I've also tried to make a customized actor network, but I don't know how much sense it makes. That one is as follows, and showed the same behaviour.
nI = no_states; % (101) number of inputs (states)
nA = no_actions; % (8) number of actions (continuous)
nL1 = 128;
nL2 = 64;
statePath = [
featureInputLayer(nI,'Normalization','none','Name','state')
fullyConnectedLayer(nL1, 'Name','commonFC1')
reluLayer('Name','CommonRelu')];
meanPath = [
fullyConnectedLayer(nL1,'Name','MeanFC1')
reluLayer('Name','MeanRelu')
fullyConnectedLayer(nA,'Name','Mean')
];
stdPath = [
fullyConnectedLayer(nL1,'Name','StdFC1')
reluLayer('Name','StdRelu')
fullyConnectedLayer(nA,'Name','StdFC2')
softplusLayer('Name','StandardDeviation')]; %because the standardDeviation alsways needs to be positive
concatPath = concatenationLayer(1,2,'Name','GaussianParameters');
actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');
actorOptions = rlRepresentationOptions('LearnRate',LearningRate);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,'Observation',{'state'},actorOptions);
Hi, I see the question is posted long time ago but I have faced the same problem and found the root cause and I would to share it, hopping it will help others.
The input consists of two types or more of data, normalization of the componant should be considered. Otherwise, the output of the actor neural network will be biased to the larger componant values.
BR

Sign in to comment.

Answers (0)

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!