- With warm-up experiences, the agent might be exploring the state and action space more efficiently.
- The learning rates for your critic and actor networks are set to allow for small updates. With a good initial experience buffer, the updates may be more stable and require fewer adjustments, leading to faster convergence and less time spent on each gradient update step.
- You mentioned that 'agentOptions.ResetExperienceBufferBeforeTraining' is set to 'false'. If the buffer is not reset, the agent with warm-up starts with a full buffer of experiences, which could lead to more efficient sampling and less time waiting for the buffer to fill up.
DDPG Agent (used to set a temperature) 41% faster training time per Episode with Warm-up than without. Why?
10 views (last 30 days)
Show older comments
Hi,
So I noticed something while training my DDPG Agent.
I use a DDPG Agent to set a temperature for a heating system depending on the weather forecast and other temperatures such as the outside temperature.
First I trained an Agent without any warm-up and then I trained another new Agent with a warm-up of 700 episodes. It did what I had hoped, converging faster and finding a much better strategy than without the warm-up. I also noticed that the training time was much faster. I have calculated that it takes 41% less time to train an episode than the training time for one episode without a warm-up.
Don't get me wrong, I really appreciate this, but I am trying to understand why.
I have not changed any of the agent options, just the warm-up.
If the agent is supposed to win a game as quickly as possible, I would understand that because of the experience in the warm-up, the agent would find a better strategy faster to win the game faster, so it would take less time per episode to win the game, but in my case the agent should just set a temperature. There is no faster way to set a temperature.
Am I missing an important point?
I mean, in every training step and every episode the process is more or less the same. Set an action, get a reward, update the networks, update the policy and so on. Where in those steps could the 41% time improvement be?
Just to be clear, I understand why it converges faster, I just don't understand why the training time per episode is so much faster. Without a warm-up, the average training time per episode was 28.1 seconds. With a warm-up it was 16.5 seconds.
These are my agent options, which I used for both agents:
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 128;
agent.AgentOptions.ExperienceBufferLength = 1e6;
agent.AgentOptions.NoiseOptions.Variance = 0.5;
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-6;
agentOptions.ResetExperienceBufferBeforeTraining = false;
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
I also use the Reinforcement Learning Toolbox and normalised all my variables in both cases.
In general, everything works fine, but it drives me crazy that I can't understand why it's so much faster.
Maybe someone has an idea.
0 Comments
Accepted Answer
Venu
on 13 Jan 2024
Based on info u have provided, I can infer the following points:
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!