sometimes datastore read() function reads number of lines different than 'readsize' parameter
2 views (last 30 days)
Show older comments
Hello,
I have created 2D array(144000x52) in a csv format.
when I called the the function readmatrix('M.csv') I can see the variable in the workspace with the correct size (144000x52)
the issue is that when I create a datastore variable and trying to read the rows of the matrix batch by batch, sometimes number of read rows is not equal to 'readsize' parameter.
for example ;
ds=datastore('M.csv','ReadSize',2000);
for i=1:72
i
size(read(ds))
end
What I expect from the code above is that, as the readsize is 2000 and total number of rows 144000, there are 144000/2000=72 batches to be read and returned size must be 2000x52 for all the i values.
however, when i=19 and i=39
size(read(ds)) returns 186x52 (for i=19) and 193x52(for i=39)
for other i values it returns 2000x52.
0 Comments
Answers (1)
Steven Lord
on 4 Jul 2022
If you look at the description of the ReadSize property of the tabularTextDatastore class, the sentence describing the behavior when the property is a positive integer value is "If ReadSize is a positive integer, then each call to read reads at most ReadSize rows." [I added the emphasis.] There is no guarantee that read will read exactly that many rows.
I believe you should call hasdata on the datastore to determine if there is still data to be read from it rather than assuming a certain number of read calls will read the entire data set. This will also make your code more robust to changes in the size of your data; suppose that instead of reading data collected (as an example) 1 row per second for 40 hours:
hours(seconds(144000))
you instead collect 1 row per second for 60 hours. Or whatever processing you're measuring finishes more quickly than you expected and you only have 30 hours of data.
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!