simulink coder nvidia jetson gpu slower than matlab coder
5 views (last 30 days)
Show older comments
Dear All,
I don't understand why the following function (object detection algorithm based on Single Gaussian background model) when compiled and deployed to my Jetson Nano board via Matlab runs much faster than the equivalent simulink model which performs exactly the same task. Also there is a huge difference in compilation time. On the Matlab side it only takes a bunch of seconds, while from Simulink it takes more than 10 minutes to build the app on target. It is worth mentioning that i am specifically targeting the GPU by creating CUDA code both from Matlab and from Simulink.
function fastdetection() %#codegen
hwobj = jetson;
imgSizeCamera = [640 360];
camObj = camera(hwobj,"vi-output, imx477 6-001a",imgSizeCamera);
dispObj = imageDisplay(hwobj);
% parameters
n_train = 50;
buffer = zeros([imgSize,n_train]);
alpha = 0.5;
beta = 0.01;
theta = 20;
for i = 1:n_train
img = snapshot(camObj);
buffer(:,:,i) = rgb2gray(img);
end
Mu = mean(buffer,3);
tmp = buffer-Mu;
variance = sum(tmp.^2,3)./(n_train-1);
sigma = sqrt(variance);
Mu = single(Mu);
while(true)
frameGS = single(rgb2gray(snapshot(camObj)));
sigma = sigma/5;
sigma(sigma<1) = 1;
D = sqrt(((frameGS-Mu).^2)./sigma);
mask = D > theta;
% update background model
Mu(~mask) = (1-alpha)*Mu(~mask)+alpha*(frameGS(~mask));
sigma(~mask) = (1-alpha)*sigma(~mask)+ ...
alpha*(frameGS(~mask)-Mu(~mask)).^2;
% update foregorund model
Mu(mask) = (1-beta)*Mu(mask)+beta*(frameGS(mask));
sigma(mask) = (1-beta)*sigma(mask)+beta*(frameGS(mask)-Mu(mask)).^2;
imgOut = uint8(mask*250);
imrot = imrotate(imgOut,-90);
image(dispObj,imrot);
end
end
This is the script that i use to perform the code generation and deployment from matlab.
cfg = coder.gpuConfig('exe');
cfg.Hardware = coder.hardware('NVIDIA Jetson');
cfg.Hardware.BuildDir = '~/remoteBuildDir';
cfg.GenerateExampleMain = 'GenerateCodeAndCompile';
codegen('-config ',cfg,'fastdetection','-report');
pid = runApplication(hwobj,'fastdetection');
I also attach the Simulink model that performs (almost) the same thing (a part from the training stage of the model that i removed for simplicity). When compiled and started on the target, it runs much much slower, the difference in terms of FPS is noticeable by eye. I also really don't explain myself such a huge difference in compilation time when the generated code is moved to the board.
What am i missing here?
Thanks in advance for any help.
Marco
3 Comments
Hariprasad Ravishankar
on 9 Oct 2022
Hi Marco,
Thanks for your clarification. Our team is looking into this and we will get back with any findings we have.
Hari
Answers (0)
See Also
Categories
Find more on Get Started with GPU Coder in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!