Sorry if this is a dumb question, but I am not sure how to configure the sample time on my reinforcement learning agent so that it will properly interact with the Simscape Electrical environment I've created. My goal is for the RL agent to output an action every 1 seconds and then that action is used to update the MOSFET gate voltages in the environment. The environment then uses the new gate voltages to perform a 100 micro second pulse where the MOSFET's drain-source currents are measured midway through the pulse. The measured current is used to determine the agent's rewards in addition to determining when an episode is over. An episode is over when the agent has (mostly) balanced the 4 measured currents within a defined threshold.
My confusion comes when trying to setup the timing in the environment, particularly the timing of the outputs that go to the RL agent. The agent requires the environment to output every Ts (1 sec), but I need an additional delay of roughly 100 micro seconds in order for the pulse and subsequent current measurements to take place.
I believe I have a fundamental misunderstanding of the way sample time works here. Any help is greatly appreciated. Thank you
Additional note: there is a bug where the agent is outputting the same sequence of actions every episode, regardless of the previous observation.