DDPG multiple action noise variance error

Question

Tech Logg Ding on 6 Nov 2020

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/638155-ddpg-multiple-action-noise-variance-error

Commented: 勇刚张 on 30 Mar 2022

Open in MATLAB Online

Hi,

I am working on developing an adaptive PID for a water tank level controller shown here:

The outputs of the RL Agent block are the 3 controller gains. As the 3 gains have very different range of values, I thought it was a good idea to use different variance for every action as suggested in the rlDDPGAgentOptions page.

However, when I initiate training, I get the following error:

Caused by:
    Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
    For 'Output Port 1' of 'rlwatertankAdaptivePID/RL Agent/AgentWrapper', the 'outputImpl' method of the System object
    'rl.simulink.blocks.AgentWrapper' returned a value whose size [3x3], does not match the value returned by the 'getOutputSizeImpl' method. Either
    change the size of the value returned by 'outputImpl', or change the size returned by 'getOutputSizeImpl'.

I defined the agent options as follow:

%% Specify DDPG agent options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.9;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.TargetSmoothFactor = 5e-3;
% due to large range of action values, variance needs to be individually
% defined for every action [kp, ki and kd]
% range of kp, ki and kd should be taken into account
% kp =[-6, 6], range = 12
% ki = [-0.2, 0.2], range = 0.4
% kd = [-2, 2], range = 4
% rule states that variance should be var*sqrt(Ts) between 1% to 10% of the
% range
agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;
agentOptions.NoiseOptions.Variance = [0.8, 0.02, 0.2];
%agentOptions.NoiseOptions.Variance = 0.2;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-4;

How do I work around this?

Note: If I only specify one variance it works fine, but the exploration and acheived results is not good

3 Comments
Show 1 older commentHide 1 older comment

张冠宇 on 18 Nov 2021

Open in MATLAB Online

may i ask how can i get 3 actions like kp ki kd, should i set as follows?

actInfo = rlNumericSpec([3 1]);

or [1 3]

or other settings

as i meet the error

Input data dimensions must match the dimensions specified in the corresponding observation and action info
    specifications.

    obsInfo = rlNumericSpec([3 1],...   % rlNumericSpec：代表连续的动作或观测数据。rlFiniteSetSpec：代表离散的动作或观测数据。
    'LowerLimit',[-inf -inf -inf]',...
    'UpperLimit',[ inf  inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured height';
numObservations = obsInfo.Dimension(1); % 取观测矩阵的维度
actInfo = rlNumericSpec([3 1]); 
actInfo.Name = 'flow';
numActions = actInfo.Dimension(1);
%构建环境接口对象
env = rlSimulinkEnv('Load_Freq_Ctrl_rl2','Load_Freq_Ctrl_rl2/RL Agent',...
    obsInfo,actInfo);
%设置自定义重置功能，以随机化模型的参考值。
env.ResetFcn = @(in)localResetFcn(in);
%以秒为单位指定模拟时间Tf和智能体采样时间Ts。
Ts = 0.2;
Tf = 30;
%修复随机生成器种子以提高可重复性。
rng(0)
%创建DDPG智能体
statePath = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
    fullyConnectedLayer(50,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(25,'Name','CriticStateFC2')];
actionPath = [
    imageInputLayer([3 1 1],'Normalization','none','Name','Action')
    fullyConnectedLayer(25,'Name','CriticActionFC1')];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
%观察评论者网路的配置。
figure
plot(criticNetwork)
%使用指定评论者表示的选项rlRepresentationOptions。
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
%创建critic
actorNetwork = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
    fullyConnectedLayer(3, 'Name','actorFC')
    tanhLayer('Name','actorTanh')
    fullyConnectedLayer(3,'Name','Action')
    ];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%创建智能体
agentOpts = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'DiscountFactor',1.0, ...
    'MiniBatchSize',64, ...
    'ExperienceBufferLength',1e6); 
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
%训练agent
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);%     'SaveAgentCriteria',"EpisodeReward",'SaveAgentValue',100', 
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',5, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','EpisodeCount',...
    'StopTrainingValue',2000);%155较好
%自己为true
doTraining = true;
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps,'StopOnError','on');
experiences = sim(env,agent,simOpts);

thank you

勇刚张 on 30 Mar 2022

构造深度网络为什么用imgeInputLayer() 而不用featureLayer() 作为输入层；

actorInfo中得上下限也要像obsInfo中一样重新声明一下，记得用列向量。

Good luck

Sign in to comment.

Sign in to answer this question.

DDPG multiple action noise variance error

3 Comments
Show 1 older commentHide 1 older comment

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

DDPG multiple action noise variance error

3 Comments Show 1 older commentHide 1 older comment

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment