Why is this PPO agent not able to learn its task ?

Bastian

25 Sep 2023

0 Answers

Updated 2 Nov 2023

9 Views (30 days)

Follow Question

Show older comments

0 votes

Hi, I am currently trying to learn the leg coordination of a hexapod robot, meaning it should learn when to lift its legsso that an efficient gait(tripod, wave, etc...)emerges.

I am relatively new to RL and for the last two weeks I tried to get this to work, but no matter the algorithm parameters or reward function definition, the agent does not learn at all.

I am running a Simscape Physics Simulation of the hexapod robot(stepsize=0.25ms)
The hexapod has 3 joints per leg, , the agent receives the α-angle(Rad.) of each leg as observations
The movement sequence of a hexapod consists of a swing phase(lift leg and put in front) and stance phase(push leg back to move foreward)
The movement of a leg, meaning swing and stance phase is predefined, the agent only has to decide when to initiate the swing

As said above, the agent receives the α-angles as observations and has to output a 1 to initiate the swing of a leg as an action(all other output values do nothing)
The reward is currently defined as the following: , where is the movement speed in x-direction to reward moving forward, the y-position to discourage diversion from a straight line, and the height difference between normal and current height, to discourage stumbling or falling.

I appreciate any advice you can give me, I just find it very odd that there seems to be no progress. I probably ran this simulation about 10 times with parameter changes to about 2000 Ep., but it allways looks just like the graph above.

Does the agent lack more information, is the reward poorly defined ?

Thank you in advance for any tips.

To give you as much information as possible, here are all the RL parameters:

I use the PPO architecture used here, but greatly reduced the size of the layers from 300/400 to 32(tested larger NN as well without any success)

Agent options:

ExperienceHorizon=512, ...

MiniBatchSize=128, ...

ClipFactor=0.2,...

EntropyLossWeight=0.01,...

NumEpoch=3,...

AdvantageEstimateMethod="gae",...

GAEFactor=0.95,...

NormalizedAdvantageMethod="none",...

AdvantageNormalizingWindow=1e6,...

ActorOptimizerOptions=actorOpts,...

CriticOptimizerOptions=criticOpts,...

SampleTime=0.05,...

DiscountFactor=0.99

actorOpt and criticOpts only contain: learnRate=0.02

Training options:

MaxEpisodes=10000,...

MaxStepsPerEpisode=512,...

ScoreAveragingWindowLength=50,...

Verbose=true,...

Plots="training-progress",...

StopTrainingCriteria="EpisodeCount",...

StopTrainingValue=maxEpisodes,...

SaveAgentCriteria="EpisodeReward",...

SaveAgentValue=65

During an Episode(512 steps, 0.05 sample time) a hexapod with a predefined tripod gait(no learning) receives >120 as a reward

1 Comment
Show -1 older comments Hide -1 older comments

Muhammad Fairuz Abdul Jalal on 2 Nov 2023

Hi.

I have provided comment on this similar topic here: https://www.mathworks.com/matlabcentral/answers/1629850-ppo-reinforcement-learning-agent-doesn-t-learn?s_tid=srchtitle

Hope it helps.

Follow Question

Answers (0)

Products

Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Why is this PPO agent not able to learn its task ?

1 Comment
Show -1 older comments Hide -1 older comments

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Why is this PPO agent not able to learn its task ?

1 Comment Show -1 older comments Hide -1 older comments

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments