I have attempted to make some improvements following the advice of @Riccardo Bonomelli. I have now created a new function update_row() which is placed inside a wrapper function modify_matrix() , whose job it is to loop over all rows. The rationale for doing this was so that this could easily be used with parfor. The new code is shown below:
%%%%%%%%%%%% Set up and initialisation %%%%%%%%%%%%
N_iter = 100; % Number of iterations
Ny = 500; % Dimensions of matrix M
Nx = 400; %
a_vec = rand(1,Nx) + rand(1, Nx)*1i; % Predefine some vectors
b_vec = rand(1,Nx) + rand(1, Nx)*1i; %
c_vec = rand(1,Nx) + rand(1, Nx)*1i; %
d_vec = rand(1,Nx) + rand(1, Nx)*1i; %
e_vec = rand(1,Nx) + rand(1, Nx)*1i; %
A = rand(1) + rand(1)*1i; % A scalar constant
M = rand(Ny, Nx) + rand(Ny, Nx)*1i; % Initialise the matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%% Main loop %%%%%%%%%%%%%%%%%%%%
tic
for k = 1:N_iter
% ...
M = modify_matrix_mex(M, Nx, Ny, A, a_vec, b_vec, c_vec, d_vec, e_vec);
% Do more processing on new M ...
% ...
end
t = toc;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
and the functions are as follows:
%%%%%%%%%%%%% Function to modify matrix M %%%%%%%%%%%
function M = modify_matrix(M, Nx, Ny, A, a_vec, b_vec, c_vec, d_vec, e_vec)
parfor m = 2:(Ny-1) % Loop over axial values
M(m,:) = update_row(M(m,:), Nx, Ny, A, a_vec, b_vec, c_vec, d_vec, e_vec);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% Function to modify a row of matrix M %%%%%%
function M_row = update_row(M_row, Nx, Ny, A, a_vec, b_vec, c_vec, d_vec, e_vec)
f_vec = complex(zeros(1,Nx));
B = complex(zeros(1,Nx));
B(2:Nx-1) = A*M_row(2:Nx-1) + a_vec(2:Nx-1).*M_row(3:Nx) + b_vec(2:Nx-1).*M_row(1:Nx-2);
for n = 2:(Nx-1) % Forward sweep
f_vec(n+1) = c_vec(n)*B(n) + d_vec(n)*f_vec(n);
end
for n = Nx:-1:2 % Backward sweep
M_row(n-1) = e_vec(n)*M_row(n) + f_vec(n); % Update the M entries
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
(In the last function I have moved the creation of B outside the loop, because I saw it can be vectorized. I don't believe the remaining loops can be removed though).
Running the original code for Nt = 100 iterations originally took 5.7 seconds on my machine. Converting modify_matrix() to a MEX function dropped this to 0.75 seconds. (I also tried making update_row() into a MEX function but since it is called from inside the other MEX function I wasn't sure how to compile them in a nested way).
Finally adding the parfor loop reduced again further to 0.43 seconds, on my quad core machine. So a total speedup of x13.
I would like to run this code for Nt = several tens / hundreds of thousands of iterations, not just 100 now, so need a better speedup. Can anyone provide support with how to achieve this?
Many thanks,
Tom