I'm using datastore, but is it a correct implementation?

2 views (last 30 days)
Morning all,
I have two csv textfiles, each of which are over 1GB in size and contain millions of rows of data.
I would like to loop over reading the textfile and extracting relevant information based on the value of some variable in the textfile, something like:
var1 = 1:5;
var2 = 1:20;
% Setup data store
dataStore = datastore('test.csv','ReadVariableNames',true);
for i = 1:length(var1)
thisVar1 = var1(i);
for j = 1:length(var2)
thisVar2 = var2(j);
reset(dataStore);
thisData = table;
while hasdata(dataStore)
% Read in Chunk
dataChunk = read(dataStore);
thisData = [thisData
dataChunk((dataChunk.var1 == thisVar1 & dataChunk.var2 == thisvar2),:)];
end
.
.
.
end
end
This code implementation does work, but it takes an age to run and I really don't have that sort of time to sit around waiting. Can anyone help me with more efficient code or more efficient tools?
Many thanks
  6 Comments
jlt199
jlt199 on 9 Dec 2016
Thanks for your response. I can't show the header row, but the first few lines of data are below
L_204W_204Depth_25IE_1WT_127.txt,25,12.7,1,20.4,20.4,3,7,1,1,2646.2402,16,128.8303,350.9423,1859.8547,-3898.9065,274.1336,37.9153,37.9153,1,1,1,1
L_204W_204Depth_25IE_1WT_127.txt,25,12.7,1,20.4,20.4,3,7,1,2,2646.2402,16,123.8009,348.4928,1859.8547,-3898.9065,274.1336,37.9153,37.9153,1,2,1,1
L_204W_204Depth_25IE_1WT_127.txt,25,12.7,1,20.4,20.4,3,7,1,2,2580.6321,14,138.3819,365.0617,1821.3754,-3898.7576,269.4419,38.8017,38.8017,1,2,1,2
L_204W_204Depth_25IE_1WT_127.txt,25,12.7,1,20.4,20.4,3,7,1,3,2573.4738,18,131.014,360.082,1861.5846,-3898.9065,269.4419,38.8017,38.8017,1,3,1,1
L_204W_204Depth_25IE_1WT_127.txt,25,12.7,1,20.4,20.4,3,7,1,3,2613.4361,15,117.5712,344.2184,1840.4812,-3898.7576,274.1336,37.5043,37.5043,1,3,1,2
L_204W_204Depth_25IE_1WT_127.txt,25,12.7,1,20.4,20.4,3,7,1,3,2613.4361,15,129.2886,360.8319,1821.3754,-3898.5992,274.1336,37.5043,37.5043,1,3,1,3
per isakson
per isakson on 9 Dec 2016
Edited: per isakson on 9 Dec 2016
Try this with a complete file
>> cac = cssm()
cac =
{6x1 cell} [6x22 double]
>> cac{1}{1}
ans =
L_204W_204Depth_25IE_1WT_127.txt
>> cac{2}(1,1:6)
ans =
25.0000 12.7000 1.0000 20.4000 20.4000 3.0000
>>
where
function cac = cssm()
fid = fopen( 'cssm.txt' );
cac = textscan( fid, ['%s',repmat('%f',[1,22])] ...
, 'Headerlines',0, 'CollectOutput',true, 'Delimiter',',' );
[~] = fclose( fid );
end
and cssm.txt contains your six lines of data.
If you are on Windows, use the Task Manager to see how much memory is used. I assume you still have a lot of free memory.

Sign in to comment.

Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!