Speedup processing of larger binary files

Question

0 votes

Dear all,

I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.

Currently I am doing this the following way:

    conv = @(c) uint8(bitget(c,1:32));
    measurement = NaN(1,(sizeOfMeasurements*8))   %(1,134217728)
    fid = fopen(fileName, 'rb');
    byteContent = fread(fid,'uint32');
    fclose(fid);
    bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
    measurement=[bitRepresentation1{:}];
  end

However, reading a single file takes minutes and makes evaluation of the entire data set a very time-consuming task.

UPDATE: I replaced fopen successfully by memmapfile using the code below:

    m=memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
    byteContent=m.data.byteContent;
    byteContent = double(byteContent);

I printed timing information (using tic/toc) for the individual instructions and it turns out that the bottleneck is:

    bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);  % see first line of code for conv

Are there more efficient was of transforming byteContent into an array that stores a bit per index?

UPDATE2: I received suggestion from another source, that there are superfluous loops introduced by the conv function. The new code looks like this:

    fid = fopen(fileName, 'rb');
    bitContent = fread(fid,'*ubit64');
    fclose(fid);
    conv = @(ii) uint8(bitget(bitContent, ii));
    bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
    measurement = reshape(cat(2, bitRepresentation{:})', 1, []);

This brings execution time of code line bitRepresentation = arrayfun[...] down from 39s to 0.5s. However, now the bottleneck is the very last code line with 5s.

5 Comments
Show 3 older comments Hide 3 older comments

Jan on 29 Nov 2016

@Guillaume: Storing a bit in a bit is very efficient for the storing. But the processing is much harder, e.g. when for logical indexing. I'm using a C-mex script for logical indexing with bit fields, which is remarkably faster than indexing with LOGICAL vectors. But the main effect is not the compact storage of the bits, but I guess that Matlab does not pre-allocate efficiently. For an LOGICAL version see: FEX: CopyMask . I'm still astonished.

Walter Roberson on 30 Nov 2016

Did you try timing dec2bin() or de2bi() compared to bitget() ?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jan on 29 Nov 2016

Edited: Jan on 29 Nov 2016

Open in MATLAB Online

0 votes

Omit this line:

measurement = NaN(1,(sizeOfMeasurements*8)) %(1,134217728)

A pre-allocation is a waste of time, if the result is overwritten later.

If you want to access the data bitwise, use an integer type:

byteContent = fread(fid, '*uint32'); % Instead of storing it in a DOUBLE

Creating a large cell is not efficient. I assume that these lines can be replaced:

bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];

If you explain the wanted result, a suggestion for a replacement is possible and I will expand my answer.

[EDITED]

fid = fopen(FileName, 'r');
if fid == -1
  error('Cannot open file: %s', FileName);
end
Data = fread(fid, [8, inf], 'ubit1=>uint8');
fclose(fid);

Now each bit is stored as an UINT8 element of the value 1 or 0.

Perhaps this is faster (at least it is in R2009a: 0.25 sec on a virtual machine for a 16MB file):

Data   = fread(fid, inf, '*uint8');
Result = [bitget(Data, 1), bitget(Data, 2), bitget(Data, 3), ...
          bitget(Data, 4), bitget(Data, 5), bitget(Data, 6), ...
          bitget(Data, 7), bitget(Data, 8)];

What a pitty that bitget(X, 1:8) is not valid in Matlab, when X is not a scalar.

2 Comments
Show None Hide None

André Schaller on 29 Nov 2016

Dear Jan,

i.e., given an 16MB binary file, the wanted result shall be an array A, of dimensions 1x134217728, where every index of the array stores the respective bit (either 0 or 1).

To give an example that is more illustrative. If the binary file only consists of one byte 0x55, the array A shall be of size 1x8 with values: 01010101.

Jan on 29 Nov 2016

See [EDITED]

Sign in to comment.

Speedup processing of larger binary files

5 Comments
Show 3 older comments Hide 3 older comments

Answers (1)

2 Comments
Show None Hide None

Categories

Tags

Community Treasure Hunt

Speedup processing of larger binary files

5 Comments Show 3 older comments Hide 3 older comments

Answers (1)

2 Comments Show None Hide None

Categories

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments

2 Comments
Show None Hide None