When loading .mat files in a parfor, the first time is way slower than the second time.

5 views (last 30 days)
Hi all,
I've encountered a weird behavior I wasn't able to understand or find a possible explanation of.
I wrote a function for loading some files (data structures whose size ranges from 40 to 100 MB) from a dataset in a parfor, and do some operations.
I've noticed that the first time I launch the script, the execution is incredibly slower than the successive executions (38 seconds vs 1.8 seconds).
I've tried to remove the parfor and use a simple for, but there is still a difference between the first and the successive times, even thou more limited (17 seconds vs 11 seconds).
I've also tried different datasets, and there is the same behavior. When I restart Matlab and I launch the same call the first time, same thing. If I stop and restart parpool, same thing.
I am wondering why it is like this and if I can do something to avoid this behavior.
Matlab 2019a Update 4, Unix (64-bit)
PS: parpool was already started.
PPS: the successive executions are faster even after calling clear all/clearvars.
PPPS: to remove all possible other influences, I've cleaned the code so that now it just loads files. Same behavior.
  2 Comments
Daniel M
Daniel M on 15 Nov 2019
I think this is either an issue with either caching or the just-in-time compiler organizing itself on the first run of the loop, or an issue with broadcasting in the parfor.
Question:
  1. are you using "clear" or "clear all" at the top of your script? Try using "clearvars" instead.
  2. Are you loading the same files every loop? You could try loading once at the beginning of the loop instead.
Francesco Onorati
Francesco Onorati on 15 Nov 2019
Edited: Francesco Onorati on 15 Nov 2019
no broadcasting variables in the loop. Just tried clear all and clearvars, but the successive executions are stil way faster. I load the same data: first time, slow; second (and successive) time(s), fast. If I change dataset, same thing: first time, slow; after, fast.

Sign in to comment.

Answers (1)

Daniel M
Daniel M on 15 Nov 2019
Edited: Daniel M on 15 Nov 2019
So you're doing something like this?
for k = 1:10
mydata = load('myfile.mat');
output = someFunction(mydata);
end
That's pretty inefficient. You should load the data once outside the loop. It will be faster to read the data from a cache than to load it each time (because typically speed of memory is better than I/O).
As for why the first iteration is slower, I believe that is due to the JIT compiler doing its magic. This is also referred to as 'warm-up time'. Hopefully someone with a deeper understanding can weigh-in here.
Try running this script to test for warm up time. Note: run this in a script, not the command window (because the JIT effects may not take place in the command window).
clearvars
close all
clc
% Create some data, but only once
if ~exist('data.mat','file')
data = rand(1,1e8,'single');
save('data.mat','data');
clear data
end
fname = 'data.mat';
fprintf('loading\n')
tic
mydata = load(fname);
data = mydata.data;
loadtime = toc;
% display the loading time
fprintf('It took %f s to load the file.\n',loadtime)
% Run some stuff in a loop and time it.
iters = 20;
t2 = zeros(1,iters);
for k = 1:iters
t1 = tic;
% do some random processes on mydata
tmp1 = data.^2;
tmp2 = sin(tmp1);
t2(k) = toc(t1);
end
figure
stem(t2)
xlabel('Time')
ylabel('Iteration')
% first couple iterations take longer
% get the warm up time (from first few iterations)
warmtime = max(t2(1:3))/mean(t2(end-3:end)) - 1;
fprintf('First few iterations were %.0f %% slower than last\n',warmtime*100)
fprintf('done!\n')
And the output:
loading
It took 2.576633 s to load the file.
First few iterations were 58 % slower than last
done!
% see attached figure
  5 Comments
Francesco Onorati
Francesco Onorati on 15 Nov 2019
Edited: Francesco Onorati on 15 Nov 2019
function test_parfor(path)
files = dir(path);
parfor k = 1:length(files)
mydata = load(fullfile(path, files(k).name));
end
Daniel M
Daniel M on 15 Nov 2019
Edited: Daniel M on 15 Nov 2019
Can you write a self sufficient test script please? That does not run, nor is it a test of parfor.

Sign in to comment.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Tags

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!