Massive increase in execution speed with MEX function?

Dear all,
I was experimenting with accelerating my code through MEX functions. See the code below (it solves a system of linear equations with constraints using the lsqlin function in each voxel of a computed tomography dataset - 512 x 512 x 163 voxels). It took about 90 minutes to process the whole dataset. When I converted the code to a MEX function and ran it, it only took 2.3 seconds.
I'm very happy with that, just wanted to ask if a such a massive speed-up is something you would expect with MEX functions (it's the first time I used it).
Thank you in advance.
Samuel
...
parfor k = 1:nslices
for j = 1:ncols
for i = 1:nrows
if low_image(i,j,k) > -521
d = [low_image_bf(i,j,k); high_image_bf(i,j,k)];
Aeq = [1 1 1];
beq = 1;
opts = optimoptions("lsqlin","Display","off");
x = lsqlin(C,d,[],[],Aeq,beq,[0 0 0],[1 1 1],[],opts);
mat_Fe(i,j,k) = x(1); % frakcia zeleza
mat_Fat(i,j,k) = x(2); % frakcia oleja
mat_Soft(i,j,k) = x(3); % frakcia vody (agaru+trocha jodu)
end
end
end
kstr = string(k);
msg = strcat(' Slice ', kstr, ' spracovany');
disp(msg)
end

5 Comments

Personally I don't believe the massive acceleration you observe.
@Samuel can you elaborate a bit on your process?
  • How did you create the MEX-file, by hand (i.e., rewrote the MATLAB code in C/Fortran and then compiled it) or used MATLAB Coder?
  • What part of the code did you convert to a MEX-file? The entire outer for-loop (k)? The inner for-loop (i)?
  • Are you running the parallel pool locally or on a cluster? How large of a pool?
  • How are you timing the code? wall clock? tic/toc? other?
  • We're only seeing a portion of the code, have you preallocated mat_Fe, mat_Fat, and mat_Soft?
EDIT: I found one source of the huge speed difference. The code was meant to be run only in voxels with a value higher than -521. However I messed up variable names in the MATLAB version, which resulted in running the algorithm in every single voxel.
The naming error wasn't present in the MEX version and therefore it correctly skipped all the irrelevant voxels.
The fixed MATLAB version now finishes in 1063 seconds. The MEX function is still almost 500 times faster though... I don't know, is this closer to what one would expect?
Thanks again
---
@Raymond Norris thank you for the reply, sure I can:
  1. I used Matlab Coder
  2. yes, the entire outer loop
  3. locally on a workstation, 8 physical cores, 8 workers
  4. tic toc
  5. yes, I have preallocated them
this is the full portion of the relevant code (MATLAB version):
% nrows, ncols and nslices are constants pertaining to the size of the
% original CT volumes
mat_Fe = zeros(nrows, ncols, nslices);
mat_Fat = zeros(nrows, ncols, nslices);
mat_Soft = zeros(nrows, ncols, nslices);
C = [dect_L_Fe_HU dect_L_vz12_HU dect_L_vz1_HU; dect_H_Fe_HU dect_H_vz12_HU dect_H_vz1_HU]; % real valued coefficients
tic
parfor k = 1:nslices
for j = 1:ncols
for i = 1:nrows
if low_image(i,j,k) > -521
d = [low_image_bf(i,j,k); high_image_bf(i,j,k)];
Aeq = [1 1 1];
beq = 1;
opts = optimoptions("lsqlin","Display","off");
x = lsqlin(C,d,[],[],Aeq,beq,[0 0 0],[1 1 1],[],opts);
mat_Fe(i,j,k) = x(1); % frakcia zeleza
mat_Fat(i,j,k) = x(2); % frakcia oleja
mat_Soft(i,j,k) = x(3); % frakcia vody (agaru+trocha jodu)
end
end
end
kstr = string(k);
msg = strcat(' Slice ', kstr, ' spracovany');
disp(msg)
end
toc
It almost seemed as though the compiled version wasn't running the exact same data set. What puzzles me though is if you are using the MATLAB Coder and not hand coding this yourself, wouldn't the compiled version also have the mistaken variable name? To ensure there aren't any other differences, have you compared the results (mat_*) to ensure that they are identical between compiled and non-compiled?
Is the local pool of 8 workers already started before you call tic?
Is 8 a factor of nslices?
Small suggestion, pull
Aeq = [1 1 1];
beq = 1;
opts = optimoptions("lsqlin","Display","off");
to just after the parfor call. These are constant values that don't need to be reassigned in each iteration.
I'm not an Optimization guy, but I'm wondering if there's any value in tinkering with the Algorithm. This link is in reference to fmincon, but it should apply to lsqlin as well.
MATLAB Coder dev team member here: speedups on the order of 100x are not unheard of, though that's a big speedup indeed. I second Raymond's advice to sanity check things.
If you fixed the variable naming bug in your MATLAB code, did you regenerate the MEX code with Coder?
Is the MEX returning equivalent answers to running your code in MATLAB? If you're using floating point, the answers can't be expected to be identical but should have similar quality. For things like optimization solvers it's possible that expected numeric differences cause convergence to a different answer making one implementation vastly slower than another.
If you wrap the call to your function / mex in tic;toc; do you see similar speedups? Namely tic; yourFunction(); toc vs tic; yourFunction_mex(); toc.

Sign in to comment.

Answers (0)

Products

Release

R2022b

Asked:

on 11 Apr 2023

Edited:

on 12 Apr 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!