RL SAC agent structure
Show older comments
I’ve created an SAC agent, but I'm encountering the error below.
Error using rl.internal.validate.mapFunctionMeanStdOutput (line 10)
Deep neural network for continuous gaussian function must have 2 output layers, one for mean and one for standard deviation.
Error in rlContinuousGaussianActor (line 93)
model = rl.internal.validate.mapFunctionMeanStdOutput(model,nameValueArgs.ActionMeanOutputNames,nameValueArgs.ActionStandardDeviationOutputNames,"actor");
Error in RL_agent_1 (line 158)
actor1 = rlContinuousGaussianActor(actorNetwork1, obsInfo1, actInfo1, ...
I’ve also attached the code for my RL agent, and I’ve bolded the relevant part, which clearly shows that I already have two layers—one for the mean and one for the standard deviation.
% Create environment
codeenv = createOpfEnv();
% Retrieve observation and action specifications
obsInfo = getObservationInfo(env); % Observation info for all agents
actInfo = getActionInfo(env); % Action info for all agents
% Separate the observation and action information for each agent
numAgents = 3; % Example with 3 agents
% Separate observation and action info
obsInfo1 = obsInfo{1}; % Observation info for agent 1
obsInfo2 = obsInfo{2}; % Observation info for agent 2
obsInfo3 = obsInfo{3}; % Observation info for agent 3
actInfo1 = actInfo{1}; % Action info for agent 1
actInfo2 = actInfo{2}; % Action info for agent 2
actInfo3 = actInfo{3}; % Action info for agent 3
%% Define actor networks for each agent
% Define the actor network for Agent 1
actorNetwork1 = [
featureInputLayer(obsInfo1.Dimension(1), 'Normalization', 'none', 'Name', 'state1')
fullyConnectedLayer(64, 'Name', 'fc1_1')
reluLayer('Name', 'relu1_1')
fullyConnectedLayer(64, 'Name', 'fc2_1')
reluLayer('Name', 'relu2_1')
fullyConnectedLayer(64, 'Name', 'fc3_1')
reluLayer('Name', 'relu3_1')
fullyConnectedLayer(1, 'Name', 'mean1') % Output for the mean
fullyConnectedLayer(1, 'Name', 'std1') % Output for the standard deviation
];
% Define the actor network for Agent 2
actorNetwork2 = [
featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', 'Name', 'state2')
fullyConnectedLayer(64, 'Name', 'fc1_2')
reluLayer('Name', 'relu1_2')
fullyConnectedLayer(64, 'Name', 'fc2_2')
reluLayer('Name', 'relu2_2')
fullyConnectedLayer(64, 'Name', 'fc3_2')
reluLayer('Name', 'relu3_2')
fullyConnectedLayer(1, 'Name', 'mean2') % Output for the mean
fullyConnectedLayer(1, 'Name', 'std2') % Output for the standard deviation
];
% Define the actor network for Agent 3
actorNetwork3 = [
featureInputLayer(obsInfo3.Dimension(1), 'Normalization', 'none', 'Name', 'state3')
fullyConnectedLayer(64, 'Name', 'fc1_3')
reluLayer('Name', 'relu1_3')
fullyConnectedLayer(64, 'Name', 'fc2_3')
reluLayer('Name', 'relu2_3')
fullyConnectedLayer(64, 'Name', 'fc3_3')
reluLayer('Name', 'relu3_3')
fullyConnectedLayer(1, 'Name', 'mean3') % Output for the mean
fullyConnectedLayer(1, 'Name', 'std3') % Output for the standard deviation
];
% For each agent, we'll define a critic network that combines the state and action
statePath1 = [
featureInputLayer(obsInfo1.Dimension(1), 'Normalization', 'none', Name="state1")
fullyConnectedLayer(64, Name="state_fc1_1")
reluLayer(Name="state_relu1_1")
];
actionPath1 = [
featureInputLayer(actInfo1.Dimension(1), 'Normalization', 'none', Name="action1")
fullyConnectedLayer(64, Name="action_fc1_1")
reluLayer(Name="action_relu1_1")
];
commonPath1 = [
concatenationLayer(1, 2, Name="concat1")
fullyConnectedLayer(64, Name="common_fc1_1")
reluLayer(Name="common_relu1_1")
fullyConnectedLayer(64, Name="common_fc2_1")
reluLayer(Name="common_relu2_1")
fullyConnectedLayer(1, Name="value1")
];
statePath2 = [
featureInputLayer(obsInfo2.Dimension(1), 'Normalization', 'none', Name="state2")
fullyConnectedLayer(64, Name="state_fc2_2")
reluLayer(Name="state_relu2_2")
];
actionPath2 = [
featureInputLayer(actInfo2.Dimension(1), 'Normalization', 'none', Name="action2")
fullyConnectedLayer(64, Name="action_fc2_2")
reluLayer(Name="action_relu2_2")
];
commonPath2 = [
concatenationLayer(1, 2, Name="concat2")
fullyConnectedLayer(64, Name="common_fc1_2")
reluLayer(Name="common_relu1_2")
fullyConnectedLayer(64, Name="common_fc2_2")
reluLayer(Name="common_relu2_2")
fullyConnectedLayer(1, Name="value2")
];
statePath3 = [
featureInputLayer(obsInfo3.Dimension(1), 'Normalization', 'none', Name="state3")
fullyConnectedLayer(64, Name="state_fc3_3")
reluLayer(Name="state_relu3_3")
];
actionPath3 = [
featureInputLayer(actInfo3.Dimension(1), 'Normalization', 'none', Name="action3")
fullyConnectedLayer(64, Name="action_fc3_3")
reluLayer(Name="action_relu3_3")
];
commonPath3 = [
concatenationLayer(1, 2, Name="concat3")
fullyConnectedLayer(64, Name="common_fc1_3")
reluLayer(Name="common_relu1_3")
fullyConnectedLayer(64, Name="common_fc2_3")
reluLayer(Name="common_relu2_3")
fullyConnectedLayer(1, Name="value3")
];
%% Assemble critic networks for each agent
% Combine state and action paths
criticNetwork1 = layerGraph(statePath1);
criticNetwork1 = addLayers(criticNetwork1, actionPath1);
criticNetwork1 = addLayers(criticNetwork1, commonPath1);
criticNetwork1 = connectLayers(criticNetwork1, 'state_relu1_1', 'concat1/in1');
criticNetwork1 = connectLayers(criticNetwork1, 'action_relu1_1', 'concat1/in2');
criticNetwork2 = layerGraph(statePath2);
criticNetwork2 = addLayers(criticNetwork2, actionPath2);
criticNetwork2 = addLayers(criticNetwork2, commonPath2);
criticNetwork2 = connectLayers(criticNetwork2, 'state_relu2_2', 'concat2/in1');
criticNetwork2 = connectLayers(criticNetwork2, 'action_relu2_2', 'concat2/in2');
criticNetwork3 = layerGraph(statePath3);
criticNetwork3 = addLayers(criticNetwork3, actionPath3);
criticNetwork3 = addLayers(criticNetwork3, commonPath3);
criticNetwork3 = connectLayers(criticNetwork3, 'state_relu3_3', 'concat3/in1');
criticNetwork3 = connectLayers(criticNetwork3, 'action_relu3_3', 'concat3/in2');
%% Set options for the actor and critic
actorOptions = rlRepresentationOptions('Optimizer', 'adam', 'LearnRate', 1e-4, 'GradientThreshold', 1);
criticOptions = rlRepresentationOptions('Optimizer', 'adam', 'LearnRate', 1e-4, 'GradientThreshold', 1);
%% Create actor and critic representations for each agent
% Use continuous actor for each agent (as required by SAC)
actor1 = rlContinuousGaussianActor(actorNetwork1, obsInfo1, actInfo1, ...
'ActionMeanOutputNames', 'mean1', 'ActionStandardDeviationOutputNames', 'std1');
actor2 = rlContinuousGaussianActor(actorNetwork2, obsInfo2, actInfo2, ...
'ActionMeanOutputNames', 'mean2', 'ActionStandardDeviationOutputNames', 'std2');
actor3 = rlContinuousGaussianActor(actorNetwork3, obsInfo3, actInfo3, ...
'ActionMeanOutputNames', 'mean3', 'ActionStandardDeviationOutputNames', 'std3');
% Create critic representations for each agent
%% Create Q-value critics for each agent
critic1 = rlQValueRepresentation(criticNetwork1, obsInfo1, actInfo1, criticOptions);
critic2 = rlQValueRepresentation(criticNetwork2, obsInfo2, actInfo2, criticOptions);
critic3 = rlQValueRepresentation(criticNetwork3, obsInfo3, actInfo3, criticOptions);
%% Define the SAC agent for each agent
agentOptions = rlSACAgentOptions('SampleTime', 1, ...
'TargetSmoothFactor', 1e-3, ...
'TargetUpdateFrequency', 1, ...
'ExperienceBufferLength', 1e6);
agent1 = rlSACAgent(actor1, critic1, agentOptions);
agent2 = rlSACAgent(actor2, critic2, agentOptions);
agent3 = rlSACAgent(actor3, critic3, agentOptions);
%% Training options and training process
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 500, ...
'MaxStepsPerEpisode', 100, ...
'ScoreAveragingWindowLength', 100, ...
'Verbose', true, ...
'Plots', 'training-progress');
%% Train the agents
train(agent1, env, trainOpts);
train(agent2, env, trainOpts);
train(agent3, env, trainOpts);
Accepted Answer
More Answers (0)
Categories
Find more on Reinforcement Learning in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!