Variable Sample Time in Reinforcement Learning
Show older comments
Hey there,
i created a Simulink Model, which takes input values from an RL Agent. A Matlab function Block processes these values and outputs a set of actions, the model shall perform. This set has a variable length.
While this action set is performed, the reward function shall be calculated, but the Agent shall not output new values, as they are not taken into account, by the MATLAB function.
Maybe an example Makes it more clear:
- The Agent outputs a value. Lets say 5
- Based on the current state of the Model, an action set is created. Lets say the current state of the Model is 2 and the created action set would consist out of the steps 2,3,4 and 5. These set of actions are of variable length and are continous numbers between an lower and upper boundary. Could be something like 2.7, 4.01 aswell
- While these actions are being performed, the Model does not react to any values, the Agent puts out. Reacting takes the Model between 0.1 and about 30 seconds.
- The state of the Model while performing the set of actions must be evaluated by the reward function.
- When the model has finished this set of actions, it is ready to take the next value from the agent.
Currently i have a sample rate of 0.1. This being the shortest amount of time, a set of actions can keep the Model busy.
If the first action set takes the Model 10 seconds to react to, there are 99 suggestions from the RL Agent, which the model does not react to. I fear that this might lead to pretty bad training.
I need the Agent to output a value, wait for feedback from the model an then output the next value. The reward function should have a higher resolution. Is something like this possible?
Thank you.
1 Comment
Niklas Braun
on 10 Dec 2020
Accepted Answer
More Answers (0)
Categories
Find more on Training and Simulation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!