efficient ways to read and write complex valued data
Show older comments
I have large files that contain interleaved complex data. I reach chunks of it at a time for processing. It seams that there is no way to read this efficiently. I can't find out how to do it without a copy that shouldn't really be necessary. Here are some things that I tried (y.fid points to a file opened with fopen(), y.type is a valid type):
% method 1: read in to a vector, interleave with complex
tic
cData = fread(y.fid,numSamp.*2, y.type);
t1 = toc;
tic
cData = complex(cData(1:2:end),cData(2:2:end));
t2 = toc;
The above method is slower than it has to be. I think the indexing really slows down the copy into cData.
% method 2: read into 2xN array, use complex and indexing
tic
cData = fread(y.fid,[2 numSamp], y.type);
t3 = toc;
tic
cData = complex(cData(1,:),cData(2,:)).';
t4 = toc;
The above method is indeed faster. It still requires a copy.
% method 3: read into 2xN array, use complex muliply
tic
cData = fread(y.fid,[2 numSamp], y.type);
t5 = toc;
tic
cData = cData.' * [1 ;1j];
t6 = toc;
The above is the slowest of them all.
I tried memmap, but it's inappropriate for very large files. I looked into MEX files. It requires that you create a new complex array for every read, but you could probably fill the array efficiently. The problem is that this is worse than an fread to an existing array of the correct size because one has to allocate new memory for every call.
Thanks in advance.
5 Comments
dpb
on 29 Jun 2020
" y.type is a valid type):"
Well, what type is that?
How about attach a small(ish) sample file for folks to play with...always nice to know the answer as well.
Walter Roberson
on 29 Jun 2020
Unfortunately matlab does not provide any InterleavedComplex constructor equivalent to complex() but with a single argument. And there is no typecast to interleaved complex either.
You are right that now that the internals are interleaved complex that there should be tools to handle them directly.
James Tursa
on 30 Jun 2020
Edited: James Tursa
on 30 Jun 2020
Which version of MATLAB are you using? Do you have access to R2018a or later, which has native interleaved complex type?
If not, then you are probably stuck with workarounds like the copy or a mex routine that can read & separate the real & imag as you go.
I don't understand this comment about a drawback for a mex routine:
"... one has to allocate new memory for every call ..."
Even the m-file methods require new memory is allocated for the reading, so why is this a perceived drawback for a mex routine?
Chris Gelrlich
on 30 Jun 2020
James Tursa
on 30 Jun 2020
Edited: James Tursa
on 30 Jun 2020
"... I thought the array would be copied to the left hand side argument when the mex function completed and that the array in the function would be freed ..."
No, that is not what happens.. What happens is a shared data copy of plhs[0] is created and sent back to the caller, not a deep copy. Then when the mex fuction returns and plhs[0] is destroyed, only its header is destroyed, not the data. So no extra data copy is done in this case.
The mex funtion would be pretty simple to write, using all official API functions.
Accepted Answer
More Answers (2)
Well, dunno how much faster/slower it might be; didn't try to time it, but trying something different--
Iffen by "interleaved" you mean the data were written something like
c=complex(rand(10,1),rand(10,1)); % make a dummy array for playing around with
fid=fopen('complex.bin','w');
for i=1:10,fwrite(fid,[real(c(i)) imag(c(i))],'double');end
fid=fclose(fid);
fidr=fopen('complex.bin'); fidi=fopen('complex.bin'); % handle for real, complex parts
fseek(fidi,8,'bof'); % position file pointer for first complex
complex(fread(fidr,inf,'double',8), fread(fidi,inf,'double',8))
fclose all; clear fidr fidi
Reproduced the original c array here...
Alternatively, you could rewind() the file and then read the second.
Whether the i/o would be quicker than the indexing operation I dunno...
Fortran handles it with reading into a complex variable--you might consider mex using Fortran instead of C
2 Comments
James Tursa
on 30 Jun 2020
Fortran is simply reading interleaved complex file data into interleaved complex variable, which can effectively be done with fread( ) directly.. This doesn't get the real & complex separated.
Chris Gelrlich
on 30 Jun 2020
James Tursa
on 30 Jun 2020
Edited: James Tursa
on 30 Jun 2020
0 votes
This may not apply to you, but if you have R2018a or later you can just fread( ) into a real variable directly the interleaved data and then use this FEX submission that reinterprets the real variable as a complex variable with half the number of elements using the supplied real2complex mex routine. This returns a shared data copy so it is memory efficient. It uses unofficial hacks to accomplish this because MATLAB does not supply official API functions for this. You need to have a C compiler installed to compile it.
Hopefully MATLAB will eventually supply an official function for this and the reverse function, since it is a natural capability that many users will want.
5 Comments
Chris Gelrlich
on 30 Jun 2020
James Tursa
on 30 Jun 2020
Edited: James Tursa
on 30 Jun 2020
Can you respond to my comment on your post? Why wouldn't creating a complex variable of known size inside a C mex routine and then fread into it inside the C mex routine work for you? This would all be using official API functions, no hacking required. You would just need to make sure you used the -R2018a mex compile option to force the complex variable you create to be interleaved from the start.
How is this complex variable used downstream in your code? Anything that would cause it to be shared with other variables? If not, you could even write things to avoid the memory allocation beyond the first one (reading directly into the first such variable created).
Chris Gelrlich
on 9 Jul 2020
James Tursa
on 9 Jul 2020
Edited: James Tursa
on 9 Jul 2020
Well, your questions on how this works are certainly legitimate and not out of line, so no apology needed. Particularly in light of the fact that these details are not published in MATLAB documentation, so how could you even know?
But getting back to your point of memory allocation, your original supposition is true ... a new memory allocation is in fact required by the mex routine each time the file is read. My points were that (1) this is also true of m-file code so no disadvantage of the mex routine here, and (2) no extra copy is made by the mex routine when returning plhs[0] back to the caller. The freadcomplex performance gains are purely the result of being able to convert a real variable to a complex variable via pointer manipulation without requiring a data copy.
The mex routine posted in the FEX plays by the rules and allocates new memory each time it is called, because it calls MATLAB fread( ) in the background and that is what MATLAB fread( ) will do. There is a way to avoid that memory allocation each time, but you would have to abandon MATLAB fread( ) and write the code using the C fread( ) function, and you would have to modify a MATLAB variable inplace (reading data directly into its data area) which would violate the rules (you could inadvertently write into other variables sharing data with the one you modified). As long as you didn't care about potential side effects (e.g., you never use those other shared variables downstream in your code) you can get away with this. Or you could hold the varialbe inside the mex routine and return a shared data copy of it (which also violates the official rules). It might be interesting to see what kind of speedup this would get. But given your aversion to unofficial methods, I didn't write that.
Chris Gelrlich
on 10 Jul 2020
Categories
Find more on OpenCV Support in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!