DDPG Agent Not Converging (Rotary Inverted Pendulum)

Question

Hi everyone,
I'm currently training a DDPG agent in MATLAB (R2023b) to control a rotary inverted pendulum (RIP) modeled in Simscape.
However, my training does not converge, the reward fluctuates up and down without a clear learning trend. The agent’s actions also appear erratic, even when the system starts near the upright position (alpha = 0).
I tried two different models:
one with a DC motor block (to simulate the real RIP motor), and
another without it, where the agent’s action is injected directly into the joint torque input.
The second setup (without the motor) was working much better at first. But after I added the DC motor model, the training performance dropped, the reward stays almost "constant", and the agent doesn’t seem to explore properly.
Now, after several tests, both models are not working well anymore ;-;
This is how I defined the observations and action:
obsInfo = rlNumericSpec([4 1], 'LowerLimit', -Inf, 'UpperLimit', Inf);
obsInfo.Name = 'Observations';
obsInfo.Description = 'alpha0, alpha_dot, theta0, theta_dot';
actInfo = rlNumericSpec([1 1], 'LowerLimit', -5, 'UpperLimit', 5);
actInfo.Name = 'Torque';

The actor:
actorNet = [
    featureInputLayer(4, 'Name', 'state')
    fullyConnectedLayer(256, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(128, 'Name', 'fc2')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(1, 'Name', 'fc4')
    tanhLayer('Name', 'tanh')
];
actorOptions = rlOptimizerOptions;
actorOptions.Algorithm = "adam";
actorOptions.LearnRate = actorLearnRate;
actor = rlContinuousDeterministicActor(actorNet, obsInfo, actInfo);


The critic:
statePath = [
    featureInputLayer(4, 'Name', 'state')
    fullyConnectedLayer(256, 'Name', 'fc1') 
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(128, 'Name', 'fc2') 
    reluLayer('Name', 'relu2')
];
actionPath = [
    featureInputLayer(1, 'Name', 'action')
    fullyConnectedLayer(256, 'Name', 'fc3')
    reluLayer('Name', 'relu3')
    fullyConnectedLayer(128, 'Name', 'fc4')
    reluLayer('Name', 'relu4')
];
commonPath = [
    additionLayer(2, 'Name', 'add')
    fullyConnectedLayer(256, 'Name', 'fc5')
    reluLayer('Name', 'relu5')
    fullyConnectedLayer(256, 'Name', 'fc6')
    reluLayer('Name', 'relu6')
    fullyConnectedLayer(1, 'Name', 'q_value')
];

criticNet = layerGraph();
criticNet = addLayers(criticNet, statePath);
criticNet = addLayers(criticNet, actionPath);
criticNet = addLayers(criticNet, commonPath);
criticNet = connectLayers(criticNet, 'relu2', 'add/in1');
criticNet = connectLayers(criticNet, 'relu4', 'add/in2');

criticOptions = rlOptimizerOptions;
criticOptions.Algorithm = "adam";
criticOptions.LearnRate = criticLearnRate;
critic = rlQValueFunction(criticNet, obsInfo, actInfo);

Parameters:
Ts = 0.01;                    
gamma = 0.99;                 
miniBatchSize = 128;          
experienceBufferLength = 1e6; 
tau = 1e-3;                   
actorLearnRate = 1e-5;        
criticLearnRate = 5e-4;       

Reward function:
c = [0.1, 1, 5, 20, 1, 1];
theta_max = pi / 3;  
alpha_max = pi / 6;  
if abs(theta) > theta_max || abs(alpha) > alpha_max || abs(alpha_dot) > 2*pi
    F = 10;  
    isDone = true; 
else
    F = 0;  
    isDone = false;  
end
rw = -c(1)*(c(2)*(theta^2) + c(3)*(theta_dot^2) + c(4)*(alpha^2) + c(5)*(alpha_dot^2) + c(6)*(u_d1^2)) - F;

What I’ve Tried
Normalizing observations and actions.
Adjusting actor/critic network sizes and learning rates.
Adjusting noise.
Randomizing initial conditions (alpha0) using the ResetFcn.
Observed Behavior
The reward graph fluctuates randomly between episodes and never stabilizes.
Even without the DC motor, the agent still behaves oddly

The behavior of the reward function in most tests looks like this: it goes up and down, almost like an oscillating wave, and never converges.

Can someone help me, please? I don't know what else to do.
Thank you so much in advance.

DDPG Agent Not Converging (Rotary Inverted Pendulum)

0 Comments
Show -2 older comments Hide -2 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

DDPG Agent Not Converging (Rotary Inverted Pendulum)

0 Comments Show -2 older comments Hide -2 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments