sometimes datastore read() function reads number of lines different than 'readsize' parameter

2 views (last 30 days)
Hello,
I have created 2D array(144000x52) in a csv format.
when I called the the function readmatrix('M.csv') I can see the variable in the workspace with the correct size (144000x52)
the issue is that when I create a datastore variable and trying to read the rows of the matrix batch by batch, sometimes number of read rows is not equal to 'readsize' parameter.
for example ;
ds=datastore('M.csv','ReadSize',2000);
for i=1:72
i
size(read(ds))
end
What I expect from the code above is that, as the readsize is 2000 and total number of rows 144000, there are 144000/2000=72 batches to be read and returned size must be 2000x52 for all the i values.
however, when i=19 and i=39
size(read(ds)) returns 186x52 (for i=19) and 193x52(for i=39)
for other i values it returns 2000x52.

Answers (1)

Steven Lord
Steven Lord on 4 Jul 2022
If you look at the description of the ReadSize property of the tabularTextDatastore class, the sentence describing the behavior when the property is a positive integer value is "If ReadSize is a positive integer, then each call to read reads at most ReadSize rows." [I added the emphasis.] There is no guarantee that read will read exactly that many rows.
I believe you should call hasdata on the datastore to determine if there is still data to be read from it rather than assuming a certain number of read calls will read the entire data set. This will also make your code more robust to changes in the size of your data; suppose that instead of reading data collected (as an example) 1 row per second for 40 hours:
hours(seconds(144000))
ans = 40
you instead collect 1 row per second for 60 hours. Or whatever processing you're measuring finishes more quickly than you expected and you only have 30 hours of data.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!