Read string from files since R2020a
15 views (last 30 days)
Show older comments
I have a large binary data file with some ASCII formatted metadata header at the beginning. To read this header, I use 'string-oriented' read functions like fgetl(~), fscanf(~, '%s', ~) or fread(~, ~, '*char'). In Matlab versions prior to R2020a (I have R2014b and R2019b) this worked just fine, however in the R2020a something changed.
Now the very first, but only the first, attempt to read any string from the file will use extensive amount of memory and freezes the whole thing. I have a guess that Matlab is trying to read the whole file into memory. And in my case the file itself is larger than available RAM which probably cause the freezing.
Here what I do:
% Here everything works just fine
fd = fopen('file.name', 'r');
arr1 = fread(fd, 1);
arr2 = fread(fd, 1);
fclose(fd);
% Here I have a problem
fd = fopen('file.name', 'r');
arr1 = fread(fd, 1); % fast and smooth
arr2 = fread(fd, 1, '*char'); % uses extensive amount of RAM and slow
arr3 = fread(fd, 1, '*char'); % fast and smooth again
fclose(fd);
1) It does not matter what part of the file I read.
2) All numeric type returning read functions are always fast.
3) The first string returning read function is always slow and does not matter what function I use (as long as it returns string).
4) All successive string reads are as fast as numeric ones.
5) Once the read function returns the string the memory is released.
6) File position pointer is always at expected position (does not move to end of the file).
7) It does not matter if the file is opened in text or binary mode.
8) The issue is presented both on Windows and Linux.
Any idea?
0 Comments
Accepted Answer
Sindar
on 22 Apr 2020
From the release notes:
"As of R2020a, character-oriented file I/O functions such as fscanf, fgets, and fgetl trigger automatic character set detection when reading a file that was opened using fopen without a specified encoding."
My suspicion then is that the "automatic character set detection" may require looking through the full file.
Try specifying the encoding in fopen, e.g.,
fd = fopen('file.name', 'r','n','UTF-8');
2 Comments
Walter Roberson
on 22 Apr 2020
See also the discussion at https://www.mathworks.com/matlabcentral/answers/512803-why-do-i-get-out-of-memory-when-reading-only-16-chars
More Answers (0)
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!