DDPG Agent Not Converging (Rotary Inverted Pendulum)
Show older comments
Hi everyone,
I'm currently training a DDPG agent in MATLAB (R2023b) to control a rotary inverted pendulum (RIP) modeled in Simscape.
However, my training does not converge, the reward fluctuates up and down without a clear learning trend. The agent’s actions also appear erratic, even when the system starts near the upright position (alpha = 0).
I tried two different models:
- one with a DC motor block (to simulate the real RIP motor), and
- another without it, where the agent’s action is injected directly into the joint torque input.
The second setup (without the motor) was working much better at first. But after I added the DC motor model, the training performance dropped, the reward stays almost "constant", and the agent doesn’t seem to explore properly.
Now, after several tests, both models are not working well anymore ;-;
This is how I defined the observations and action:
obsInfo = rlNumericSpec([4 1], 'LowerLimit', -Inf, 'UpperLimit', Inf);
obsInfo.Name = 'Observations';
obsInfo.Description = 'alpha0, alpha_dot, theta0, theta_dot';
actInfo = rlNumericSpec([1 1], 'LowerLimit', -5, 'UpperLimit', 5);
actInfo.Name = 'Torque';
The actor:
actorNet = [
featureInputLayer(4, 'Name', 'state')
fullyConnectedLayer(256, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(128, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(1, 'Name', 'fc4')
tanhLayer('Name', 'tanh')
];
actorOptions = rlOptimizerOptions;
actorOptions.Algorithm = "adam";
actorOptions.LearnRate = actorLearnRate;
actor = rlContinuousDeterministicActor(actorNet, obsInfo, actInfo);
The critic:
statePath = [
featureInputLayer(4, 'Name', 'state')
fullyConnectedLayer(256, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(128, 'Name', 'fc2')
reluLayer('Name', 'relu2')
];
actionPath = [
featureInputLayer(1, 'Name', 'action')
fullyConnectedLayer(256, 'Name', 'fc3')
reluLayer('Name', 'relu3')
fullyConnectedLayer(128, 'Name', 'fc4')
reluLayer('Name', 'relu4')
];
commonPath = [
additionLayer(2, 'Name', 'add')
fullyConnectedLayer(256, 'Name', 'fc5')
reluLayer('Name', 'relu5')
fullyConnectedLayer(256, 'Name', 'fc6')
reluLayer('Name', 'relu6')
fullyConnectedLayer(1, 'Name', 'q_value')
];
criticNet = layerGraph();
criticNet = addLayers(criticNet, statePath);
criticNet = addLayers(criticNet, actionPath);
criticNet = addLayers(criticNet, commonPath);
criticNet = connectLayers(criticNet, 'relu2', 'add/in1');
criticNet = connectLayers(criticNet, 'relu4', 'add/in2');
criticOptions = rlOptimizerOptions;
criticOptions.Algorithm = "adam";
criticOptions.LearnRate = criticLearnRate;
critic = rlQValueFunction(criticNet, obsInfo, actInfo);
Parameters:
Ts = 0.01;
gamma = 0.99;
miniBatchSize = 128;
experienceBufferLength = 1e6;
tau = 1e-3;
actorLearnRate = 1e-5;
criticLearnRate = 5e-4;
Reward function:
c = [0.1, 1, 5, 20, 1, 1];
theta_max = pi / 3;
alpha_max = pi / 6;
if abs(theta) > theta_max || abs(alpha) > alpha_max || abs(alpha_dot) > 2*pi
F = 10;
isDone = true;
else
F = 0;
isDone = false;
end
rw = -c(1)*(c(2)*(theta^2) + c(3)*(theta_dot^2) + c(4)*(alpha^2) + c(5)*(alpha_dot^2) + c(6)*(u_d1^2)) - F;
What I’ve Tried
- Normalizing observations and actions.
- Adjusting actor/critic network sizes and learning rates.
- Adjusting noise.
- Randomizing initial conditions (alpha0) using the ResetFcn.
Observed Behavior
- The reward graph fluctuates randomly between episodes and never stabilizes.
- Even without the DC motor, the agent still behaves oddly
The behavior of the reward function in most tests looks like this: it goes up and down, almost like an oscillating wave, and never converges.

Can someone help me, please? I don't know what else to do.
Thank you so much in advance.
Answers (0)
Categories
Find more on Applications in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!