Table performance very slow
37 views (last 30 days)
Show older comments
I have used tables within a physics model that is solved by ode23. The performance is very slow, and in troubleshooting (using profiler) I found that the majority of the time is spent in various table functions.
The three functions table.subsasgnDot, table.subsref, table.subsref alone take approximately 30% of the execution time. Within those functions it seems to be variable name checking that takes the majority of the time.
The variable names in every table are all known at the start of the program and don't change. It seems like it would be much far efficient to check once rather than every pass through the loop.
This is for a simulation that takes several minutes to run each case, and would be used to call many cases. So, the slow performance is a significant problem.
I understand performance is better when the problem is vectorized. One of the subroutines can calculate 15,000 points in 10 sec if called as a vector, but takes 1 hr if called in a loop.
However, since this problem is being solved with ode23 it is being called in a loop unavoidably and unfortunately I used tables everywhere before discovering how slow they are.
Is there any way to improve performance without major rewriting to remove all use of the table class?
Answers (4)
Oleg Komarov
on 28 Nov 2016
Edited: Oleg Komarov
on 28 Nov 2016
I have been using table() way before they were introduced into the core package, since de facto they are the ported version of the dataset() class from the Statistics Toolbox. I also noticed long time ago many limitations in terms of performance and functionality, and have logged feature enhancements with TMW.
To address the limitations of the table(), while waiting for the ufficial implementation of my enhancement requests, I created the tableutils(). Among the problems, you would be astonished to know that the disp() of a big table can literally freeze your pc until the next ice age (and I am not talking about the movies...). This is somethig that I fixed with a buffered disp method.
While my tableutils() do not address directly the problems in subsref/subsasgn, anyone is welcome to contribute to this effort to make the table() class better by submitting an issue or a Pull Request on Github.
0 Comments
Daniel Petrini
on 5 Oct 2016
Edited: dpb
on 5 Oct 2016
In my view: tables are very sporadic in perforance. Ranging from quick to very slow. I mean, do a clear and just >> table(). On mu 2016.b that can take many seconds. :-S I had to rewrite a (large) class based on tables to multiple vectors of same types. Performance is much more linear and trustworthy. Seems that the JIT does not know what to do with them? I wish Mathworks would post more about performance on these new data structures... In addition: the
<t=tic;my_class.insert_new_entry(...);toc(t)>
reported excellent times. Problem is that Matlab is "busy" and the output of toc(t) could take 2 sec to display (0.12 s)... What am I missing? I'm guessing it is some overhead in creating tables. i.e., table_1(1:5,my_col), creates a new table, and freezes...? Disclaimer: sitting on a 8 GB iCore7.
/Daniel Petrini, Stardots AB
2 Comments
Daniel Petrini
on 6 Oct 2016
My answer is probably not an answer, but rather a comment. Sorry. My first contribution to Matlab Answers.
Oleg Komarov
on 28 Nov 2016
The native table.disp() has a huge problem, and can freeze your pc for a long time. I implemented a buffered disp, that avoids this issue. See my answer below.
jbpritts
on 24 Nov 2016
I have Matlab 2016b. I can confirm that tables are terribly slow. Unless you really need it for heterogeneous data, then avoid them in any performance critical code. I will have to rewrite a fairly complicated section of code using legacy data structures. Matlab should address this extreme performance deficiency.
2 Comments
Image Analyst
on 24 Nov 2016
They are tremendously better and faster than cell arrays though, and use far less memory.
Oleg Komarov
on 28 Nov 2016
Internally, table() stores data in a cell array, where each column is a cell. So, your statement about speed and memory cannot be true, since there is additional overhead linked to VariableNames and matlab-coded subsref/subsasgn.
I do agree, that tables are more convenient.
Peter Perkins
on 7 Oct 2016
Byron, it's hard to make specific suggestions without knowing exactly what you're doing, but here are some thoughts.
Tables are best at managing data and doing vectorized operations. Based on your description, it sounds like you are probably doing scalar operations such as
t.Var(i) = x
in a loop. You've described your alternative as a complete rewrite to not use tables at all. But there is often a middle ground where you can find a localised scope in which you can pull some of the variables in a table out as, say, ordinary double vectors, do all the non-vectorized calculations on them in a scalar loop, and then assign back into the table. Sometimes you can even convert the table to a scalar struct and use the exact same syntax. Of course, separate variables or a scalar struct will not enforce equal number of rows, or provide a simple syntax for arbitrary rectangular selections, or the other things that tables are designed to do.
Hope this helps.
0 Comments
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!