Fastest way to add string

Question

Alan on 23 Sep 2014

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/155912-fastest-way-to-add-string

Answered: Alan on 23 Sep 2014

I'm dealing with very large csv files. I'm having little to no problem with speed in reading from them with readtable. However, I have found (and reported) a bug in readtable where a blank value in the first column (the line starts with the delimiter, e.g. ',') throws off all the data. A lot of my files have blank values in the first column (due to the way the equipment I'm using records the data)

So, I have to "preprocess" the files and look for these blank columns in the csv file. The most efficient method I've found is the following:

fprintf('Reading File...');
ch = fread(YGID, [1,chunksize], 'int8=>char');
%cch = char(ch');
fprintf('Getting Number Of Lines...');
nol = sum(ch == sprintf('\n')); % number of lines
fprintf('%i\n',nol);
fprintf('Replacing final commas...\n');
cch = regexprep(ch,',(\r|\n)+','$1');
clear ch;
fprintf('Getting line locations...\n');
hlocs = regexp(cch,'\n');
fprintf('Writing Header File...\n');
fwrite(HDID,cch(hlocs(2)+1:hlocs(10)));
fprintf('Replacing Initial Commas\n');
ccch = regexprep(cch,'(\r|\n)+,','$1 ,');

YGID is the file pointer from an fopen. Note that I'm purposely making new variables (not memory efficient) as I have 16 GB of RAM available on my machine and I find making a completely new variable is faster. However, once the file is of a sufficient size (>20 MB, I have some over 200MB), even this becomes very slow. The line it is getting stuck on is "ccch = regexprep(cch,'(\r|\n)+,','$1 ,');" I suspect it's because with each additional space being added (there are hundreds of thousands) it's reallocating memory for the variable. I've tried to "preallocate" the new variable with "ccch = blanks(chunksize + nol);" before it and it didn't seem to make a difference.

Is there any more efficient way to do this task?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Alan on 23 Sep 2014

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/155912-fastest-way-to-add-string#answer_152669

Open in MATLAB Online

Found my own answer. strrep is surprisingly faster than regexprep I had to add a conditional to check the OS, though:

if ispc || isunix
    fpatt = sprintf('\n,');
    rpatt = sprintf('\n, ');
else
    fpatt = sprintf('\r,');
    rpatt = sprintf('\r, ');
end
ccch = strrep(cch,fpatt,rpatt);

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Fastest way to add string

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Fastest way to add string

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments