matfile loads variables very slowly

3 views (last 30 days)
I have a 15.5 GB file saved with separate variables. The smaller parts of the file that I'd like to load are [100000 1], but the larger variables are much larger and need to be loaded via indexing. Loading one of the smaller files without indexing takes 31s, and only take 0.02 seconds if saved separately and loaded.
The problem seems to to be repeated, very slow calls to genericWho (line 202 in matlab.io.MatFile), which is distinct from the problems in this question: https://www.mathworks.com/matlabcentral/answers/81232-matfile-runs-incredibly-slowly-on-large-files-what-might-be-the-problem?s_tid=srchtitle.
>> ver
----------------------------------------------------------------------------------------------------
MATLAB Version: 9.2.0.556344 (R2017a)
MATLAB License Number: ••••••
Operating System: Microsoft Windows Server 2012 Standard Version 6.2 (Build 9200)
Java Version: Java 1.7.0_60-b19 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
----------------------------------------------------------------------------------------------------
MATLAB Version 9.2 (R2017a)
Simulink Version 8.9 (R2017a)
Bioinformatics Toolbox Version 4.8 (R2017a)
Control System Toolbox Version 10.2 (R2017a)
Curve Fitting Toolbox Version 3.5.5 (R2017a)
Data Acquisition Toolbox Version 3.11 (R2017a)
Database Toolbox Version 7.1 (R2017a)
Datafeed Toolbox Version 5.5 (R2017a)
Econometrics Toolbox Version 4.0 (R2017a)
Financial Instruments Toolbox Version 2.5 (R2017a)
Financial Toolbox Version 5.9 (R2017a)
Fixed-Point Designer Version 5.4 (R2017a)
Fuzzy Logic Toolbox Version 2.2.25 (R2017a)
Global Optimization Toolbox Version 3.4.2 (R2017a)
Image Acquisition Toolbox Version 5.2 (R2017a)
Image Processing Toolbox Version 10.0 (R2017a)
Instrument Control Toolbox Version 3.11 (R2017a)
MATLAB Coder Version 3.3 (R2017a)
MATLAB Compiler Version 6.4 (R2017a)
MATLAB Compiler SDK Version 6.3.1 (R2017a)
Mapping Toolbox Version 4.5 (R2017a)
Neural Network Toolbox Version 10.0 (R2017a)
Optimization Toolbox Version 7.6 (R2017a)
Parallel Computing Toolbox Version 6.10 (R2017a)
Partial Differential Equation Toolbox Version 2.4 (R2017a)
Signal Processing Toolbox Version 7.4 (R2017a)
SimBiology Version 5.6 (R2017a)
Simscape Version 4.2 (R2017a)
Simscape Multibody Version 5.0 (R2017a)
Simscape Power Systems Version 6.7 (R2017a)
Simulink Coder Version 8.12 (R2017a)
Simulink Control Design Version 4.5 (R2017a)
Simulink Real-Time Version 6.6 (R2017a)
Stateflow Version 8.9 (R2017a)
Statistics and Machine Learning Toolbox Version 11.1 (R2017a)
Symbolic Math Toolbox Version 7.2 (R2017a)
Wavelet Toolbox Version 4.18 (R2017a)
512 GB RAM

Accepted Answer

Jason Climer
Jason Climer on 11 Apr 2018
I was able to workaround this, as apparently the genericWho function is called when the matfile is created and loads all the variable information needed to load files. I created the property
partialWho = struct();
in matlab.io.MatFile and modified genericWho to:
function varargout = genericWho(obj, fcnHan, fcnName, varargin)
nargoutchk(0,1);
validateFirstArgIsObj(obj, fcnName);
varargout = cell(1,nargout);
for k = find(ismember(varargin,fields(obj.partialWho)))'
varargout{k} = obj.partialWho.(varargin{k});
end
if isempty(varargin)||any(~ismember(varargin,fields(obj.partialWho)))
if ~sourceExists(obj)
% Use '~' to represent a variable name that is not possible
% to generate empty return value of the right type.
[varargout{1:nargout}] = fcnHan('~');
else
if isempty(~ismember(varargin,fields(obj.partialWho)))
inds = 1:nargout;
else
inds = ~ismember(varargin,fields(obj.partialWho));
end
[varargout{inds}] = fcnHan('-file', ...
obj.Properties.Source, varargin{~ismember(varargin,fields(obj.partialWho))});
end
end
for i=1:numel(varargout)
for j = find(~ismember({varargout{i}.name},fields(obj.partialWho)))
obj.partialWho.(varargout{i}(j).name)=varargout{i}(j);
end
end
end
This takes 31s when the matfile object is created, but only 0.04 seconds for the subsequent loading call. I haven't done enough testing to guarantee that it doesn't break at other points.
I'm surprised to see so many slow loading problems with matfile for large files, which is the main use case for matfiles in practice. It appears to have been rolled out before being ready for broad usage.
  3 Comments
Jason Climer
Jason Climer on 12 Apr 2018
The built-in matfile operations seem to take a long time to run whos on the file contents, and this is run repeatedly.
The matfile class calls whos during the init:
foo=matfile('bar.mat');
> In matlab.io.MatFile/genericWho (line 202)
In matlab.io.MatFile/whos (line 309)
In matlab.io.MatFile (line 422)
In matfile (line 75)
whos is then also called whenever a field from the matfile is accessed. This is the syntax I used for smallVar.
x=foo.bar;
> In matlab.io.MatFile/genericWho (line 202)
In matlab.io.MatFile/whos (line 309)
In matlab.io.MatFile/getVariableInfoIfItExistsInSource (line 127)
In matlab.io.MatFile/subsref (line 446)
Or when it is indexed (this is the syntax I used for large bar)
i=1;x=foo.bar(1,i);
> In matlab.io.MatFile/genericWho (line 202)
In matlab.io.MatFile/whos (line 309)
In matlab.io.MatFile/getVariableInfoIfItExistsInSource (line 127)
In matlab.io.MatFile/subsref (line 446)
If you save the initialized matfile object you can then avoid the long init time when you subsequently need to load parts of the file.
Jelle Bosmans
Jelle Bosmans on 9 Nov 2022
This is an excellent solution and honestly I cannot believe the vanilla mathfile function by Mathworks does not include a similar functionallity. I hope it will be implemented in the future. The speed-up is more than a factor 100 on my system.
Thanks Jason!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!