Optomize a vectorized code

Question

Christopher on 10 Nov 2014

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/162044-optomize-a-vectorized-code

Edited: Christopher on 10 Nov 2014

I have a vectorized code. It runs fast (I think), but I would benefit enormously from reducing its computational cost. I have attached a picture of the result of the MATLAB profiler and the text of the code is appended:

This is a small part of a much larger code but it is a huge bottleneck.

Some matrix sizes: brdzeros: 4x4 CLa_wbrd: 128x128 CLa: 120x120 Qx_La, CPLa_t, etc.: 120x120

As you can see, the steps of the code are:

[ln. 978-986]   Pad a matrix with periodic borders.
[ln. 990-1006]  create matrices which are offset by 1 or -1 columns or 1 or -1 rows
[ln. 1009-1025] divide matrices by another preallocated matrix
[ln. 1029-1042] solve a finite difference equation using the above preallocated matrices
[ln. 1044-1045] get a ratio of some results.

Are there faster ways of doing the same computations? Could I use GPU acceleration?

Any ideas or strategies are much appreciated.

Code:

for d=1:dstep
% new concentrations mapped to wbrd grid
CLa_wbrd = [brdzeros              CLa(end-brd+1:end,:)  brdzeros;
            CLa(:,end-brd+1:end)  CLa                   CLa(:,1:brd);
            brdzeros              CLa(1:brd,:)          brdzeros];
CSm_wbrd = [brdzeros              CSm(end-brd+1:end,:)  brdzeros;
            CSm(:,end-brd+1:end)  CSm                   CSm(:,1:brd);
            brdzeros              CSm(1:brd,:)          brdzeros];
CYb_wbrd = [brdzeros              CYb(end-brd+1:end,:)  brdzeros;
            CYb(:,end-brd+1:end)  CYb                   CYb(:,1:brd);
            brdzeros              CYb(1:brd,:)          brdzeros];
% create offset matrices of size(O): concentrations
CLa_t  = CLa_wbrd(5-1:end-4-1,5:end-4);
CLa_r  = CLa_wbrd(5:end-4,5+1:end-4+1);
CLa_b  = CLa_wbrd(5+1:end-4+1,5:end-4);
CLa_l  = CLa_wbrd(5:end-4,5-1:end-4-1);
CLa_i  = CLa_wbrd(5:end-4,5:end-4);
CSm_t  = CSm_wbrd(5-1:end-4-1,5:end-4);
CSm_r  = CSm_wbrd(5:end-4,5+1:end-4+1);
CSm_b  = CSm_wbrd(5+1:end-4+1,5:end-4);
CSm_l  = CSm_wbrd(5:end-4,5-1:end-4-1);
CSm_i  = CSm_wbrd(5:end-4,5:end-4);
CYb_t  = CYb_wbrd(5-1:end-4-1,5:end-4);
CYb_r  = CYb_wbrd(5:end-4,5+1:end-4+1);
CYb_b  = CYb_wbrd(5+1:end-4+1,5:end-4);
CYb_l  = CYb_wbrd(5:end-4,5-1:end-4-1);
CYb_i  = CYb_wbrd(5:end-4,5:end-4);
% preallocated C/P's
CPLa_t  = CLa_t./PmatLa_t;
CPLa_r  = CLa_r./PmatLa_r;
CPLa_b  = CLa_b./PmatLa_b;
CPLa_l  = CLa_l./PmatLa_l;
CPLa_i  = CLa_i./PmatLa_i;
CPSm_t  = CSm_t./PmatSm_t;
CPSm_r  = CSm_r./PmatSm_r;
CPSm_b  = CSm_b./PmatSm_b;
CPSm_l  = CSm_l./PmatSm_l;
CPSm_i  = CSm_i./PmatSm_i;
CPYb_t  = CYb_t./PmatYb_t;
CPYb_r  = CYb_r./PmatYb_r;
CPYb_b  = CYb_b./PmatYb_b;
CPYb_l  = CYb_l./PmatYb_l;
CPYb_i  = CYb_i./PmatYb_i;
% FLUXES, dC's, and new C's
Qx_La        =  ((DmatLa_l+DmatLa_i).*(CPLa_l-CPLa_i)+(DmatLa_r+DmatLa_i).*(CPLa_r-CPLa_i));
Qy_La        =  ((DmatLa_t+DmatLa_i).*(CPLa_t-CPLa_i)+(DmatLa_b+DmatLa_i).*(CPLa_b-CPLa_i));
dCLa         =  (Qx_La/twoxstp2+Qy_La/twoystp2)*dtimestep;           % CHANGE IN HEAT CONTENT FROM HEAT FLUX
CLa          =  CLa+dCLa;
Qx_Sm        =  ((DmatSm_l+DmatSm_i).*(CPSm_l-CPSm_i)+(DmatSm_r+DmatSm_i).*(CPSm_r-CPSm_i));
Qy_Sm        =  ((DmatSm_t+DmatSm_i).*(CPSm_t-CPSm_i)+(DmatSm_b+DmatSm_i).*(CPSm_b-CPSm_i));
dCSm         =  (Qx_Sm/twoxstp2+Qy_Sm/twoystp2)*dtimestep;           % CHANGE IN HEAT CONTENT FROM HEAT FLUX
CSm          =  CSm+dCSm;
Qx_Yb        =  ((DmatYb_l+DmatYb_i).*(CPYb_l-CPYb_i)+(DmatYb_r+DmatYb_i).*(CPYb_r-CPYb_i));
Qy_Yb        =  ((DmatYb_t+DmatYb_i).*(CPYb_t-CPYb_i)+(DmatYb_b+DmatYb_i).*(CPYb_b-CPYb_i));
dCYb         =  (Qx_Yb/twoxstp2+Qy_Yb/twoystp2)*dtimestep;           % CHANGE IN HEAT CONTENT FROM HEAT FLUX
CYb          =  CYb+dCYb;
CLaSm = CLa./CSm;
CSmYb = CSm./CYb;
end