Speedup processing of larger binary files
Show older comments
Dear all,
I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.
Currently I am doing this the following way:
conv = @(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];
end
However, reading a single file takes minutes and makes evaluation of the entire data set a very time-consuming task.
UPDATE: I replaced fopen successfully by memmapfile using the code below:
m=memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent=m.data.byteContent;
byteContent = double(byteContent);
I printed timing information (using tic/toc) for the individual instructions and it turns out that the bottleneck is:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false); % see first line of code for conv
Are there more efficient was of transforming byteContent into an array that stores a bit per index?
UPDATE2: I received suggestion from another source, that there are superfluous loops introduced by the conv function. The new code looks like this:
fid = fopen(fileName, 'rb');
bitContent = fread(fid,'*ubit64');
fclose(fid);
conv = @(ii) uint8(bitget(bitContent, ii));
bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
measurement = reshape(cat(2, bitRepresentation{:})', 1, []);
This brings execution time of code line bitRepresentation = arrayfun[...] down from 39s to 0.5s. However, now the bottleneck is the very last code line with 5s.
5 Comments
KSSV
on 29 Nov 2016
m = memmapfile(file,'Format','double') ;
Try this...any error?
What is the prupose of your line:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
What I don't understand is why each single bit has to be stored as individual numbers, wasting memory and processing time.
Computers already have a very efficient way of storing and processing arrays of bits. It's called uint8, uint16, etc.
Here is a novel idea: use a bit to store a bit rather than a byte to store a bit. Leave your numbers as is. Use 8 times less memory.
Jan
on 29 Nov 2016
@Guillaume: Storing a bit in a bit is very efficient for the storing. But the processing is much harder, e.g. when for logical indexing. I'm using a C-mex script for logical indexing with bit fields, which is remarkably faster than indexing with LOGICAL vectors. But the main effect is not the compact storage of the bits, but I guess that Matlab does not pre-allocate efficiently. For an LOGICAL version see: FEX: CopyMask . I'm still astonished.
Walter Roberson
on 30 Nov 2016
Did you try timing dec2bin() or de2bi() compared to bitget() ?
Answers (1)
Omit this line:
measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)
A pre-allocation is a waste of time, if the result is overwritten later.
If you want to access the data bitwise, use an integer type:
byteContent = fread(fid, '*uint32'); % Instead of storing it in a DOUBLE
Creating a large cell is not efficient. I assume that these lines can be replaced:
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];
If you explain the wanted result, a suggestion for a replacement is possible and I will expand my answer.
[EDITED]
fid = fopen(FileName, 'r');
if fid == -1
error('Cannot open file: %s', FileName);
end
Data = fread(fid, [8, inf], 'ubit1=>uint8');
fclose(fid);
Now each bit is stored as an UINT8 element of the value 1 or 0.
Perhaps this is faster (at least it is in R2009a: 0.25 sec on a virtual machine for a 16MB file):
Data = fread(fid, inf, '*uint8');
Result = [bitget(Data, 1), bitget(Data, 2), bitget(Data, 3), ...
bitget(Data, 4), bitget(Data, 5), bitget(Data, 6), ...
bitget(Data, 7), bitget(Data, 8)];
What a pitty that bitget(X, 1:8) is not valid in Matlab, when X is not a scalar.
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!