Reinforcement Learning Agent continuous vs discrete sample time conflict with Stop Simulation

I am training a SAC agent for a Parafoil using the Reinforcement Learning Toolbox and Simulink. The physics are simulated in continuous time using the 6DOF block.
My architectural requirement:
  1. The RL Agent must execute actions strictly every 5 seconds.
  2. The simulation must terminate EXACTLY when the continuous altitude reaches 0m, so that the terminal reward and precision error are calculated accurately, not 30 meters later.
The Problem:
  • If I put ZOH blocks at the inputs of the RL Agent (observation, reward, isdone), the agent properly samples at 5s. However, if the simulation stops at 0m between two sampling intervals (e.g., at t=112s), the agent's ZOH hasn't updated since t=110s. Therefore, the agent misses the terminal reward.
  • If I remove the ZOH blocks to let the continuous signals reach the Agent so it captures the exact terminal state, the Agent loses its 5s sample time and runs continuously at the solver's base rate (ignoring the downstream Enabled Subsystem which runs only every 5s and whose input is the action output of the RL Agent block).
Question: What is the MathWorks best practice/recommended Simulink architecture to force the RL Agent block to execute strictly every 5 seconds, while still allowing the environment to terminate and evaluate the final exact continuous state between agent steps?

Answers (1)

Based on your requirements, my understanding is that you are looking for a hybrid approach where the RL Agent executes actions strictly at a fixed interval of 5 seconds, while the simulation is still able to detect and respond to the moment when the continuous altitude reaches 0.
In this scenario, using Rate Transition blocks can be a suitable approach to manage signal interactions between the continuous-time dynamics and the discrete-time RL Agent, ensuring proper sample-time handling in both directions.
You may also find the following documentation helpful for designing and validating this type of architecture:
  1. Zero-Order Hold - Implement zero-order hold sample period - Simulink: Explains how continuous signals are sampled and held at a specified discrete rate, which is useful for interfacing continuous dynamics with discrete controllers.
  2. Hit Crossing - Detect crossing point - Simulink: Describes how to detect precise threshold crossings in continuous signals using zero-crossing detection, enabling accurate event handling such as altitude reaching zero.
  3. RL Agent can't inherint FiM sampling time - MATLAB Answers - MATLAB Central: Discusses why the RL Agent block cannot inherit fixed-in-minor-step sample times and outlines recommended approaches for enforcing discrete execution rates.

Products

Release

R2025b

Asked:

on 5 Apr 2026 at 20:33

Answered:

on 16 Apr 2026 at 10:39

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!