SAC RL agent does not explore properly (rlSACAgent)

Willemijn Remmerswaal

23 Jun 2021

0 Answers

Updated 2 Apr 2024

6 Views (30 days)

Follow Question

Show older comments

1 vote

Hi,

I'm trying to create a SAC RL agent. The agent can set 8 separate continuous actions with the same upper and lower bound (-10 and 10).

During training I observe that the actions chosen are (almost!) always one of the two bounds. So they often fluctuate between the minimum or the maximum. Sporadically another value is chosen for one of the actions.

I've found a similar question HERE, but the answer given did not solve the issue. (The range of the action space for all actions is already the same, and EntropyWeight did not change anything). Besides, I've tried to scale the reward, such as suggested in this article.

Are there any other methods for solving such problem? Or could it be that the must have some patience, and train the agent for more episodes, such that the problem is solved by itself?

Thanks in advance for any reply.

Kind regards,

3 Comments
Show 1 older comment Hide 1 older comment

Willemijn Remmerswaal on 24 Jun 2021

Open in MATLAB Online

Hi Emmanouil,

When I'm using the initialized actor network the actor network looks as follows:

I've also tried to make a customized actor network, but I don't know how much sense it makes. That one is as follows, and showed the same behaviour.

nI = no_states;         % (101) number of inputs (states)
nA = no_actions;        % (8) number of actions (continuous)
nL1 = 128;             
nL2 = 64;
statePath = [
    featureInputLayer(nI,'Normalization','none','Name','state')
    fullyConnectedLayer(nL1, 'Name','commonFC1')
    reluLayer('Name','CommonRelu')];
meanPath = [
    fullyConnectedLayer(nL1,'Name','MeanFC1')
    reluLayer('Name','MeanRelu')
    fullyConnectedLayer(nA,'Name','Mean')
    ];
stdPath = [
    fullyConnectedLayer(nL1,'Name','StdFC1')
    reluLayer('Name','StdRelu')
    fullyConnectedLayer(nA,'Name','StdFC2')
    softplusLayer('Name','StandardDeviation')];     %because the standardDeviation alsways needs to be positive
concatPath = concatenationLayer(1,2,'Name','GaussianParameters');
actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');
actorOptions = rlRepresentationOptions('LearnRate',LearningRate);
actor = rlStochasticActorRepresentation(actorNetwork,observationInfo,actionInfo,'Observation',{'state'},actorOptions);

Touleen Ibrahim on 2 Apr 2024

Hi, I see the question is posted long time ago but I have faced the same problem and found the root cause and I would to share it, hopping it will help others.

The input consists of two types or more of data, normalization of the componant should be considered. Otherwise, the output of the actor neural network will be biased to the larger componant values.

Follow Question

Answers (0)

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

SAC RL agent does not explore properly (rlSACAgent)

3 Comments
Show 1 older comment Hide 1 older comment

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

SAC RL agent does not explore properly (rlSACAgent)

3 Comments Show 1 older comment Hide 1 older comment

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

3 Comments
Show 1 older comment Hide 1 older comment