simulink coder nvidia jetson gpu slower than matlab coder

Question

marco fiorio on 6 Oct 2022

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1818580-simulink-coder-nvidia-jetson-gpu-slower-than-matlab-coder

Commented: Hariprasad Ravishankar on 9 Oct 2022

onlyDetectNanoNoBuffer.slx

Dear All,

I don't understand why the following function (object detection algorithm based on Single Gaussian background model) when compiled and deployed to my Jetson Nano board via Matlab runs much faster than the equivalent simulink model which performs exactly the same task. Also there is a huge difference in compilation time. On the Matlab side it only takes a bunch of seconds, while from Simulink it takes more than 10 minutes to build the app on target. It is worth mentioning that i am specifically targeting the GPU by creating CUDA code both from Matlab and from Simulink.

function fastdetection() %#codegen
hwobj = jetson;
imgSizeCamera = [640 360];
camObj = camera(hwobj,"vi-output, imx477 6-001a",imgSizeCamera);
dispObj = imageDisplay(hwobj);
% parameters
n_train     = 50;
buffer      = zeros([imgSize,n_train]);
alpha       = 0.5;
beta        = 0.01;
theta       = 20;
for i = 1:n_train
    img = snapshot(camObj);
    buffer(:,:,i) = rgb2gray(img); 
end
Mu = mean(buffer,3);
tmp = buffer-Mu;
variance = sum(tmp.^2,3)./(n_train-1);
sigma = sqrt(variance);
Mu = single(Mu);
while(true)
    frameGS = single(rgb2gray(snapshot(camObj)));
    sigma = sigma/5;
    sigma(sigma<1)  = 1;
    
    D               = sqrt(((frameGS-Mu).^2)./sigma);
    mask            = D > theta;
    % update background model 
    Mu(~mask)  = (1-alpha)*Mu(~mask)+alpha*(frameGS(~mask));
    sigma(~mask) = (1-alpha)*sigma(~mask)+ ...
                       alpha*(frameGS(~mask)-Mu(~mask)).^2;
    % update foregorund model
    Mu(mask)   = (1-beta)*Mu(mask)+beta*(frameGS(mask));
    sigma(mask)  = (1-beta)*sigma(mask)+beta*(frameGS(mask)-Mu(mask)).^2;
    
    imgOut = uint8(mask*250);
    imrot = imrotate(imgOut,-90);
    image(dispObj,imrot);
end
end

This is the script that i use to perform the code generation and deployment from matlab.

cfg = coder.gpuConfig('exe');
cfg.Hardware = coder.hardware('NVIDIA Jetson');
cfg.Hardware.BuildDir = '~/remoteBuildDir';
cfg.GenerateExampleMain = 'GenerateCodeAndCompile';
codegen('-config ',cfg,'fastdetection','-report');
pid = runApplication(hwobj,'fastdetection');

I also attach the Simulink model that performs (almost) the same thing (a part from the training stage of the model that i removed for simplicity). When compiled and started on the target, it runs much much slower, the difference in terms of FPS is noticeable by eye. I also really don't explain myself such a huge difference in compilation time when the generated code is moved to the board.

What am i missing here?

Thanks in advance for any help.

Marco

3 Comments
Show 1 older commentHide 1 older comment

marco fiorio on 8 Oct 2022

Hi Denis,

first of all thanks very much for your answer and your interest in my question.

The difference is both in compilation time (a bunch of second for the function compiled with codegen, vs more than 10 minutes for the simulink model) and runtime. By inspecting the GPU usage of the nano board with the command "jtop" i can clearly see that running the elf file generated from matlab, GPU shows a constant high usage as it should and the video frame rate is consequently very high. If i run the elf file produced by simulink and check again for GPU usage with "jtop", the gpu is throttiling from 0% to around 50% and the algorithm runs much slower.

I am now studying and comparing the generated code of the two versions but since i am an aerospace engineer with no academic background in c++/cuda programming, it is not so straight forward. My feeling is either that the m function in simulink is not parallelized correctly (i.e. cuda kernels are not created porperly) or that in the code generated by simulink there might be some bottlenecks or synchronization barrier that slow down the program.

Thanks in advance for any follow up.

marco

Hariprasad Ravishankar on 9 Oct 2022

Hi Marco,

Thanks for your clarification. Our team is looking into this and we will get back with any findings we have.

Hari

Sign in to comment.

Sign in to answer this question.

simulink coder nvidia jetson gpu slower than matlab coder

3 Comments
Show 1 older commentHide 1 older comment

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

simulink coder nvidia jetson gpu slower than matlab coder

3 Comments Show 1 older commentHide 1 older comment

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment