Using Reinforcement Learning in Real Experiments

3 views (last 30 days)
Hello everyone!
I used DDPG agent to control vibration with reinforcement learning using the transfer function of the 1-DOF model in simulink as a system, and it was successful as a result.
I'm trying to run an experiment by configuring real hardware, but there's one thing I'm worried about. When running the simulation, I set the sample time corresponding to the learning rate in reinforcement learning to 0.01 and found that the simulation time was slower than the real time (One episode was 15 seconds long, but it actually took over 20 seconds to finish one episode.)
(It is not possible to make the sample time large due to the frequency of the structure.)
There was no problem in simulation, but I am concerned that learning in real structures may not be possible because of this phenomenon. In simulation, it is possible to move to the next state only after action is applied, but in real structures, the structure continues to vibrate by itself even if the force is applied to the structure late. In other words, as a result, In experiments conducted with real hardware the action for the current displacement should be output in real time.
However, since the original DDPG is off-policy, I think that it is possible to process the actual action in real time even if the learning proceeds slowly.
I want to get an accurate answer, so I'm asking a question. Q1.Doesn't a phenomenon such as delay that occur in simulation occur in real experiments?
Q2. And in Simulink, the solver is using a fixed step. Is it okay to set the sample time of this Simulink and the sample time of reinforcement learning differently? (I want to set the sample time of simulink (0.001s) to be smaller than the sample time of reinforcement learning(0.01s))

Answers (1)

Ronit
Ronit on 25 Apr 2024
Hi Sumin,
Q1: Delay Phenomenon in Real Experiments
Yes, delays that occur in simulations can also occur in real experiments but manifest differently. In real-world applications, delays are primarily due to computation and actuation times. Since DDPG is off policy, it can somewhat mitigate the impact of these delays by learning from past experiences. However, significant delays can still affect the control performance.
Q2: Different Sample Times for Simulink and Reinforcement Learning
Yes, it is okay to set different sample times for Simulink (e.g., 0.001s) and the reinforcement learning algorithm (e.g., 0.01s). This approach requires careful management to ensure that the RL algorithm receives accurate state information that represents the system's dynamics over its sampling period. However, it's important to validate that this "down sampling" doesn't omit critical dynamics that could affect the learning and control performance.
I hope this helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!