4 views (last 30 days)

Kaustubha Govind
on 9 Feb 2011

Kaustubha Govind
on 10 Feb 2011

Mike Hosea
on 23 May 2011

Using emlmex to speed up FFT/IFFT is not generally possible. MATLAB already uses FFTW, which is compiled and optimized, selects an algorithm that seems especially fast for the data and hardware. Currently, emlmex generates a generic radix-2 FFT/IFFT and then compiles it with your C compiler. How your C compiler manages the cache, however well it generates code, that's how it's going to work relative to what is even possible with a generic radix-2 FFT. Well, that's almost true. There are some arbitrary cutoffs there for whether the twiddle factors are pre-computed in a lookup table or computed on the fly using sine and cosine. That sort of thing matters. Especially considering that data has to marshaled to and from a mex file, the only way emlmex-generated FFT going to be competitive with FFTW is if we're talking about a small enough vector that the FFTW overhead dominates, which is pretty small, in my experience.

In general emlmex does not speed up anything that in MATLAB is already executed in compiled libraries, e.g. LAPACK-type linear algebra computations, including linear system solves, SVD, EIG, and, of course, FFT. The kinds of things that emlmex speeds up are operations that are heavier on the MATLAB interpreter, e.g. computations that require lots of logic and/or loops. It also tends to speed up fixedpoint applications dramatically.

Mike Hosea
on 2 Jun 2011

Hard for me to say about FFTW speedups. Seems like a lot sometimes to me as well, but we're basically talking about an operation here that is really fast, and we discover these problems by running it over large data sets or by running it lots of times, otherwise it finishes in milliseconds. Having looked at it a number of times, I've found that performance varies a lot in a relative sense, I presume because of cache management. If your data set is variable in length or large enough that MATLAB Coder does not precompute the twiddle factors, then this can easily make each FFT take significantly longer in the relative sense. It is difficult to decide a priori what the right time versus space trade-off is for that table of sines or cosines.

Anyway, if you have a problem for which FFTW is a lot faster, and you are just running mex files in MATLAB or simulating in Simulink, then you can use FFTW by declaring fft "extrinsic". If you pre-declare the output size, e.g.

eml.extrinsic('fft');

y = complex(zeros(1024,1));

y = fft(x);

when you know that the output of fft is, in this example, 1024-by-1 and 'double', then you'll get the speed of FFTW minus the overhead of shuffling the data back and forth to MATLAB so that it can call FFTW. You'll have to experiment to see if it helps the speed any.

The Simulink FFT block has its own code, but it is a similar situation.

No, I don't think code generation would benefit from rewriting the matrix multiplications. Hopefully you would have BLAS enabled, and this would be done with xGEMM, and then (assuming you're running MKL BLAS) it should go as fast as the guys at Intel know how to make it.

--

Mike

James Tursa
on 9 Feb 2011

Kaustubha Govind
on 14 Feb 2011

Mike Hosea
on 2 Jun 2011

Just for completeness here, let me note that probably about half of that, if not a bit more, is actually array bounds checking. You can turn that off by doing

cfg = emlcoder.MEXConfig;

cfg.IntegrityChecks = false;

emlmex emlmextest -eg {u} -o emlmextestmex -s cfg

Quite generally FFTW does better and better over the simple algorithm the larger the data, and I'm not sure, but I seem to recall that FFTW has some facility for doing FFT2 a bit more efficiently than just iterating through the columns (which is what the simple algorithm does). For n=512, my setup is giving me about 7x improvement from FFTW. This drops to around 3x at n=128, and at 32 I had to run it longer to get decent numbers, but I got this:

>> u = complex(randn(32),randn(32));

y = 0*u;

cfg = emlcoder.MEXConfig;

cfg.IntegrityChecks = false;

emlmex emlmextest -eg {u} -o emlmextestmex -s cfg

disp('m-file'); tic, for ii=1:1000,y=emlmextest(u);end,toc

disp('mex-file'); tic, for ii=1:1000,y=emlmextestmex(u);end,toc

disp('m-file'); tic, for ii=1:1000,y=emlmextest(u);end,toc

disp('mex-file'); tic, for ii=1:1000,y=emlmextestmex(u);end,toc

m-file

Elapsed time is 0.149498 seconds.

mex-file

Elapsed time is 0.084005 seconds.

m-file

Elapsed time is 0.106787 seconds.

mex-file

Elapsed time is 0.071497 seconds.

Opportunities for recent engineering grads.

Apply TodayFind the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 0 Comments

Sign in to comment.