Level 1 s-function much faster than level 2 s-function

I have this issue where my level 1 s-function is much faster than my level 2 s-function (where they both have equivalent behavior). The level 2 s-function is about an order of magnitude slower than the level 1 function as reported in the Simulink profiler. I'm not sure what is exactly causing the issue.
I've also found these threads [1] [2], where another person reports a similar issue. However, disabling direct feedthrough on the input port doesn't solve the issue. Moreover, I would like to keep direct feedthrough enabled (and it is also enabled for the level 1 s-function).
I have attached an example level 1 and level 2 s-function, which implement a simple state-space system, for which I also encounter the same issue.
I discovered that even for the example Mathworks gives, https://nl.mathworks.com/help/simulink/sfg/maintaining-level-1-matlab-s-functions.html#bq3i98j, for converting a level 1 s-function to a level 2 s-function (with sfundsc2.m and sfundsc2_level2.m), the level 2 s-function is significantly slower, at least on my system (see the screenshot).
Are level 2 s-functions just inherently slower, or is there a way to make the excecution time comparible to level 1 s-functions?

