Implementation of Proximal Policy Optimisation

Question

shoki kobayashi on 11 Sep 2020

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/592048-implementation-of-proximal-policy-optimisation

Commented: Kashish Dhal on 12 Oct 2021

Accepted Answer: Emmanouil Tzorakoleftherakis

Open in MATLAB Online

I am currently trying to control the simlink homebrew environment using PPOAgent.

However, the following error occurs, and the problem continues to be unsuccessful.

How should we improve the situation?

Error: rl.representation.rlStochasticActorRepresentation (line 32)
Number of outputs for a continuous stochastic actor representation must be two times the number of actions.
Error: rlStochasticActorRepresentation (line 139)
Rep = rl.representation.rlStochasticActorRepresentation(...

my code

clear all
motion_time_constant = 0.01;
mdl = 'fivelinkrl';
open_system(mdl)
Ts = 0.05;
Tf = 20;
mdl = 'fivelinkrl';
open_system(mdl)
agentblk = [mdl '/RL Agent'];
numObs = 15;
obsInfo = rlNumericSpec([numObs 1]);
obsInfo.Name = 'observations';
numAct = 5;
actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);
actInfo.Name = 'Action';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
%createPPOAgent
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
createNetworkWeights;
criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
    fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ... 
                                            'Weights',weights.criticFC1, ...
                                            'Bias',bias.criticFC1)
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
                                            'Weights',weights.criticFC2, ... 
                                            'Bias',bias.criticFC2)
    reluLayer('Name','CriticRelu2')
    fullyConnectedLayer(1,'Name','CriticOutput',...
                          'Weights',weights.criticOut,...
                          'Bias',bias.criticOut)];
                      
criticOpts = rlRepresentationOptions('LearnRate',1e-3);
critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ...
                          'Observation',{'observations'},criticOpts);
                      
actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
    fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',...
                                           'Weights',weights.actorFC1,...
                                           'Bias',bias.actorFC1)
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',...
                                           'Weights',weights.actorFC2,...
                                           'Bias',bias.actorFC2)
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(numAct,'Name','Action',...
                               'Weights',weights.actorOut,...
                               'Bias',bias.actorOut)
    softmaxLayer('Name','actionProbability')
    ];  
actorOptions = rlRepresentationOptions('LearnRate',1e-3);
%%%%  ↓error   %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,... 
                         'Observation',{'observations'}, actorOptions);
%%%%  ↑error   %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,...
                        'ClipFactor',0.2,...
                        'EntropyLossWeight',0.02,...
                        'MiniBatchSize',64,...
                        'NumEpoch',3,...
                        'AdvantageEstimateMethod','gae',...
                        'GAEFactor',0.95,...
                        'SampleTime',0.05,...
                        'DiscountFactor',0.9995);
agent = rlPPOAgent(actor,critic,opt);  
%TrainAgent
maxEpisodes = 4000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxEpisodes,...
    'MaxStepsPerEpisode',maxSteps,...
    'ScoreAveragingWindowLength',250,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','EpisodeCount',...
    'StopTrainingValue',maxEpisodes,...
    'SaveAgentCriteria','EpisodeCount',...
    'SaveAgentValue',maxEpisodes);
trainingStats = train(agent,env,trainOpts);
save('agent.mat', 'agent')
Result in simulation
simOptions = rlSimulationOptions('MaxSteps',maxSteps);
experience = sim(env,agent,simOptions);

1 Comment
Show -1 older commentsHide -1 older comments

Kashish Dhal on 12 Oct 2021

Can you please update the correct code for the actor Network in the post, I am getting the same error and unable to follow through the comments?

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 15 Sep 2020

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/592048-implementation-of-proximal-policy-optimisation#answer_494926

Hello,

It seems you want to use PPO with continuous action space. If that's the case, your actor network does not have the right architecture. With stochastic agents, the neural network should end with a path that outputs 'mean' value and another path that outputs 'variance'. In your case you seem to only have a single path. Please refer to this example here to get an idea on how to set up your actor network. Also make sure you are using 20a (PPO for continuous actions was not available in previous releases as far as I remember).

Hope that helps

1 Comment
Show -1 older commentsHide -1 older comments

shoki kobayashi on 24 Sep 2020

I was able to operate successfully.

Thank you very much.

Sign in to comment.

Implementation of Proximal Policy Optimisation

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Implementation of Proximal Policy Optimisation

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments