For a big matrix, how to accelerate fprintf?

11 views (last 30 days)
Hello everyone, I have a 2500*1500 matrix and I want to print every column to a txt file, 5 numbers every row. Using :
for i=1:1500,
fprintf(fid, 'This is the %d coefficients\n', i);
S=sprintf(' %15.8E %15.8E %15.8E %15.8E %15.8E\n', coeff(:, i));
S(S=='E')='D';
fprintf(fid, '%s', S);
end
it will take several seconds. I'd like to know how can I accelerate this?
  3 Comments
Tian
Tian on 10 Jan 2017
Edited: Tian on 10 Jan 2017
Appologize. I miss a '\n' in the first fprintf.
Actually I am constructing a formatted file that has already been accepted by many softwares, I have to add a Title line 'This is the %d coefficients' (just as an example), before printing each coeff(:,i).
Tian
Tian on 10 Jan 2017
Edited: Tian on 10 Jan 2017
By 'writing in binary', do you mean use fprintf(fid, '%s', double(S)); instead of fprintf(fid, '%s', S);?
I just tried this and find that using fprintf(fid, '%s', double(S)); spent more than doubled time.
If I use fopen('test.txt', 'wb') instead of fopen('test.txt', 'w'), the time required is the same.
If I misunderstood your suggestion, please let me know. Thank you~

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 10 Jan 2017
You have a few different speed constraints
  • the speed of formatting individual numeric items, but you are already using the fastest way
  • the overhead of calling fprintf() and sprintf() multiple times, which could potentially be reduced by formatting everything at one time and then writing it all
  • the cost of doing the substitution of 'E' to 'D', which possibly could be done more efficient (but your current version looks pretty good as-is)
  • the overhead of doing the substitution multiple times, which could potentially be reduced by building the output matrix and then doing the substitution all at once.
  • the cost of writing to disk, which you cannot get away from (except to touch up the buffering strategy, perhaps, as Jan shows)
You are not calling sprintf() irresponsibly such as with just one value at a time, so it is not obvious that there is a lot of overhead that could be cut by formatting everything at once.
Formatting everything at once is possible, but it drives up your memory costs a fair bit, to the point where you have to question whether the memory allocation costs of the large arrays are going to exceed the savings in overhead of calling sprintf() less often. Especially when you make the adjustments needed for your not always having a multiple of 5 items per column to display.
My tests show that regexprep() is roughly 16 times slower than your existing S(S=='E')='D' so you probably would have difficulty being more efficient on that portion.
With you already having cut down on overheads, and being stuck with the numeric formatting time and the file I/O time, I think you are already approaching as fast as you can reasonably get for that output format.
  1 Comment
Tian
Tian on 11 Jan 2017
Thanks a lot for your detailed explanation. That's very helpful.

Sign in to comment.

More Answers (1)

Jan
Jan on 10 Jan 2017
This could be slightly faster:
fid = fopen(FileName, 'W'); % Uppercase W for better buffering
if fid == -1
error('Cannot open file for writing: %s', FileName);
end
for i = 1:1500,
fprintf(fid, 'This is the %d coefficients\n', i);
S = sprintf(' %15.8E %15.8E %15.8E %15.8E %15.8E\n', coeff(:, i));
fwrite(fid, strrep(S, 'E', 'D'), 'char');
end
But I assume the bottleneck is the slow disk transfer. The 'W' can reduce this, using an SSD would be better.
  2 Comments
Tian
Tian on 11 Jan 2017
Thanks. I'd like to try your method
Scott Campbell
Scott Campbell on 7 Dec 2022
My 15 Mb csv file went from 30 to 10 seconds.

Sign in to comment.

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!