Improving speed of readtable
Show older comments
I have a large array stored in a .dat file (see Example.dat attached) and I need to import the array into MATLAB.
At the moment I am using the following approach to load the table and convert it to an array.
Example_Table = readtable("Example.dat");
Example_Array = table2array(Example_Table);
This process is, however taking much longer than I would expect since I have a reasonably powerful PC.
I suspect that the issue is related to the array having a large number of zero entries.
The results of Run & Time are shown below

It is clear that pretty much all of the time is involved in reading the table and not in converting it to an array.
The timing profile of table.readTextFile>textscanReadData is shown below

Where all of the time is spent on the TreatAsEmpty command (because of having many zero entries?).
Below is a snapshot of the CPU and RAM usage during the reading of table.

Here it is clear that there is a lot of computational power not being used so this process should be able to be sped up some way or another.
How can I make this process run faster?
I have to read in lots of data like this and it is a very frustrating process.
Thanks in advance!
Accepted Answer
More Answers (1)
Bjorn Gustavsson
on 14 Apr 2021
0 votes
It is a rather large data-file to read. You might reduce the read-time if you use load instead of readtable - that should reduce all sorts of overhead associated with the capacity to handle all sorts of data-formats of readtable.
If you have the capacity to modify the data-format of your files that might be a far more successful way forward if you have very sparse data - then you might be better off saving the non-zero components together with their row and column indices and handle that when reading data instead of saving large number of zeros. But maybe you're given the data and have to shovel zeros and zeros around...
HTH
2 Comments
Daniel van Huyssteen
on 14 Apr 2021
Bjorn Gustavsson
on 14 Apr 2021
That's a double bummer. I'm really surprised that load takes longer time, I would've bet good money that the more general capacity of readtable would cost time. Then perhaps you can save overall processing-time by following Walter's suggestion of converting the data-files to a sparse format. You might be able to bulk-process all data-files over-night when it doesn't test your patience...
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!