parfor (file reading)

8 views (last 30 days)
AP
AP on 10 Nov 2011
Hi all,
I am trying to use parfor in order to speed up the reading of 1000 ascii files. Each file is in the following format:
  • 10 lines describing the data and is the header of the file.
  • the rest of the lines are in the format '%f %f %f %f' containing the values of x, y, z1, z2 variables. The number of these data are up to 10000.
x and y represents the rectangular domain in which z1 and z2 has been measured. Therefore, the domain remains the same among 1000 files. I want to use parfor and store one vector 10000×1 for x, one vector 10000×1 for y, one array 10000×1000 for z1 and one array 10000×1000 for z2.
I used the following pseudocode:
parfor i=1:1000
fid=fopen(fname,'r')
data=textscan(fid,'%f %f %f %f','HeaderLines',10);
x=data{1}
y=data{2}
z1(:,i)=data{3}
z2(:,i)=data{4}
end
I get the error "The variable z1 in a parfor cannot be classified". The error may arise from the indices which are restricted in parfor loop.
Is there a better way for reading these 1000 files in parallel?
Thanks.
  1 Comment
Edric Ellis
Edric Ellis on 10 Nov 2011
That code should work - in your real code, are you using 'z1' in some other way within the loop?

Sign in to comment.

Answers (1)

Daniel Shub
Daniel Shub on 10 Nov 2011
I am not sure how exactly MATLAB handles file reading and how hard drives handle multiple read request, but my guess is that distributing a job that is IO limited across multiple processors will not speed it up.
  1 Comment
Walter Roberson
Walter Roberson on 10 Nov 2011
Surprisingly, you can get better performance with parallel reads -- at least if you are using SCSI drives with ENQ (enqueue) turned on which allows the drive to re-order read requests according to which destination is "closest" to where it currently is. In common situations, the performance increases up to four parallel reads; in some data access patterns, the performance can continue to climb beyond four parallel reads, but the performance improvement past 4 is not wonderful (but if you have terabytes to get through, you'll take whatever performance increase you can get.)
It also helps if the file you are reading is not compressed and you use scatter/gather I/O.
I do not have any information on drive queue management in the newer PC drives.

Sign in to comment.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!