Fastest way to add string

2 views (last 30 days)
Alan
Alan on 23 Sep 2014
Answered: Alan on 23 Sep 2014
I'm dealing with very large csv files. I'm having little to no problem with speed in reading from them with readtable. However, I have found (and reported) a bug in readtable where a blank value in the first column (the line starts with the delimiter, e.g. ',') throws off all the data. A lot of my files have blank values in the first column (due to the way the equipment I'm using records the data)
So, I have to "preprocess" the files and look for these blank columns in the csv file. The most efficient method I've found is the following:
fprintf('Reading File...');
ch = fread(YGID, [1,chunksize], 'int8=>char');
%cch = char(ch');
fprintf('Getting Number Of Lines...');
nol = sum(ch == sprintf('\n')); % number of lines
fprintf('%i\n',nol);
fprintf('Replacing final commas...\n');
cch = regexprep(ch,',(\r|\n)+','$1');
clear ch;
fprintf('Getting line locations...\n');
hlocs = regexp(cch,'\n');
fprintf('Writing Header File...\n');
fwrite(HDID,cch(hlocs(2)+1:hlocs(10)));
fprintf('Replacing Initial Commas\n');
ccch = regexprep(cch,'(\r|\n)+,','$1 ,');
YGID is the file pointer from an fopen. Note that I'm purposely making new variables (not memory efficient) as I have 16 GB of RAM available on my machine and I find making a completely new variable is faster. However, once the file is of a sufficient size (>20 MB, I have some over 200MB), even this becomes very slow. The line it is getting stuck on is "ccch = regexprep(cch,'(\r|\n)+,','$1 ,');" I suspect it's because with each additional space being added (there are hundreds of thousands) it's reallocating memory for the variable. I've tried to "preallocate" the new variable with "ccch = blanks(chunksize + nol);" before it and it didn't seem to make a difference.
Is there any more efficient way to do this task?

Accepted Answer

Alan
Alan on 23 Sep 2014
Found my own answer. strrep is surprisingly faster than regexprep I had to add a conditional to check the OS, though:
if ispc || isunix
fpatt = sprintf('\n,');
rpatt = sprintf('\n, ');
else
fpatt = sprintf('\r,');
rpatt = sprintf('\r, ');
end
ccch = strrep(cch,fpatt,rpatt);

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!