How to read file sequentially in Parfor loop?
Show older comments
I am running parfor loop to analyze experiment data files on a local cluster with Intel Xeon Quad-core CPUs (8 Dual-CPU computers, 64 physics cores in total).
Each cluster node has Windows Server 2008 R2 Datacenter OS and MATLAB R2013a with Parallel Computing Toolbox and MATLAB Distributed Computing Server.
The files were written to the hard drive sequentially as wfm1.bin, wfm2.bin, wfm3.bin, ... until wfm42000.bin.
The file size is 7.00 MB each, and it took 11.7 seconds to analyze one file on a single core.
I built a parfor loop, to let each core in the cluster to:
(1) Read 1 file from the shared directory in the storage server #1 (node00);
(2) Analyze the data extracted from this file, and save the result (a 13 kB file) to another shared directory in storage server #2 (node16)
But when I open a matlabpool with size bigger than 32, the network data traffic from storage server gets jammed easily (Maximum 13 MB/s output rate for 1Gbps network interface across the entire cluster, all nodes are equipped with SATA3 6.0Gb/s HDD, and the point-to-point file transfer rate can reach 100 MB/s using Windows Explorer). I believe this conflict is caused by reading multiple non-consecutive files stored on different physical location on the same hard drive.
Is there any methods to control the parfor session to read files one after another one, in order to avoid the network traffic jam?
Other parallel solutions are also appreciated.
Thank you!
Here is the skeleton of my code:
cluster_size=32;
Bin_Folder_Name = '\\node00\New_RawData\';
Dat_Folder_Name = '\\node16\Fitted_Data_Storage\';
matlabpool('Cluster', cluster_size)
parfor j=1:42000
File_name=sprintf('wfm%d.bin', j);
Bin_File_name=strcat(Bin_Folder_Name, File_name);
File_name=sprintf('result%d.dat', j);
Dat_File_name=strcat(Dat_Folder_Name, File_name);
API_Mul_5_1_Sub(Bin_File_name, Dat_File_name)
end
matlabpool close
2 Comments
Kirby Fears
on 16 Sep 2015
Reading files sequentially goes against the entire idea of simultaneous parallel computing with parfor. Have you benchmarked the speed of this with a regular for loop?
I'm not sure if it would help, but you could break your parfor into fewer iterations with a non-parallel for loop inside to give you sequential file reading.
Below is an example of only 4 parallel threads that are each reading a sequential subset of your files.
parfor j=1:4,
for k=(1:10500 + (j-1)*10500),
File_name=sprintf('wfm%d.bin', k);
Bin_File_name=strcat(Bin_Folder_Name, File_name);
File_name=sprintf('result%d.dat', k);
Dat_File_name=strcat(Dat_Folder_Name, File_name); API_Mul_5_1_Sub(Bin_File_name, Dat_File_name)
end
end
You could play around with j (try j=1:2, 1:4, etc) to see if a smaller number of parallel jobs helps.
Haoyu Wang
on 16 Sep 2015
Accepted Answer
More Answers (1)
Edric Ellis
on 17 Sep 2015
To get finer-grained control over the ordering of parallel operations, you can use spmd blocks instead of parfor loops. The basic pattern would be to do something like this:
spmd
for idx = 1:numlabs:(numFiles + numlabs)
myFileIdx = idx + labindex - 1;
if myFileIdx <= numFiles
% process file with index myFileIdx
else
% skip - we've passed the end
end
% The "labBarrier" call here forces all workers to
% wait until they all reach this call. This stops
% workers from racing ahead.
labBarrier();
end
end
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!