Slowdown of Reading Large Binary Files
12 views (last 30 days)
Show older comments
I am attempting to read from a large binary file (315 GB). The file is a .op4 file written out by Nastran. The file contains a single matrix that has the size of NxM. I am able to determine ahead of time which columns (M) I need to read from the file, so I do not need to read the entire file into memory. The file has the format where it has a file header followed by all the data. The data is stored with a fileheader of 5 unit32 variables, and that header then tells you how many "words" to read and where within the NxM matrix that data is located. This is then repeated M times. Below I have put a sample code showing what I am doing.
The file I am running has about 2,300,000 columns and this script runs well for the first ~300,000 columns but then it suddenly starts to expotentially slowing down.
Running the timing script it is clear that over 90% of the time is being spent on the header=fread(fid,5,'uint32') line. I have tried finding ways of only reading the header lines ahead of time in one read, by using the 'skip' option in fread, but that bogs down as well after about 20% of the total file.
One additional note, the test case I am running is only saving about 20 columns of the 2,300,000 so there is not an issue of using too much memory in the workspace
%Where ind is a logical specifying which columns need retained in memory
fid = fopen([Path fname],'r');
if fid > 0
fseek(fid,0,-1); %Ensure you are at beginning of file
header=fread(fid,5,'uint32');
NCOL = header(2);
NROW = header(3);
NF = header(4);
NTYPE = header(5);
NAME = strtrim(fread(fid,[1,8],'*char')); % Reads ascii name of matrix if required.
data = zeros(NROW,sum(ind));
icol2 = 1;
tic
for col = 1:NCOL
if ~feof(fid)
temp_header=fread(fid,5,'uint32');
icol=temp_header(3); % Current column info
irow=temp_header(4); % Start reading at row...
NW=temp_header(5); % Number of records in current column
if ind(icol)
data(irow:irow+NW/2-1,icol2) = fread(fid,NW/2,'float64');
icol2 = icol2 + 1;
elseif ~ind(icol)
fseek(fid,NW/2*8,0);
end
end
end
fclose(fid);
end
0 Comments
Answers (1)
Image Analyst
on 13 Nov 2018
Maybe try memmapfile(). I've never used it myself so I can't offer anything beyond a suggestion to look into it.
3 Comments
Image Analyst
on 13 Nov 2018
It seems like fseek() should tell it to skip a number of bytes. Is fseek() not working for you?
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!