sometimes datastore read() function reads number of lines different than 'readsize' parameter
2 views (last 30 days)
I have created 2D array(144000x52) in a csv format.
when I called the the function readmatrix('M.csv') I can see the variable in the workspace with the correct size (144000x52)
the issue is that when I create a datastore variable and trying to read the rows of the matrix batch by batch, sometimes number of read rows is not equal to 'readsize' parameter.
for example ;
What I expect from the code above is that, as the readsize is 2000 and total number of rows 144000, there are 144000/2000=72 batches to be read and returned size must be 2000x52 for all the i values.
however, when i=19 and i=39
size(read(ds)) returns 186x52 (for i=19) and 193x52(for i=39)
for other i values it returns 2000x52.
Steven Lord on 4 Jul 2022
If you look at the description of the ReadSize property of the tabularTextDatastore class, the sentence describing the behavior when the property is a positive integer value is "If ReadSize is a positive integer, then each call to read reads at most ReadSize rows." [I added the emphasis.] There is no guarantee that read will read exactly that many rows.
I believe you should call hasdata on the datastore to determine if there is still data to be read from it rather than assuming a certain number of read calls will read the entire data set. This will also make your code more robust to changes in the size of your data; suppose that instead of reading data collected (as an example) 1 row per second for 40 hours:
you instead collect 1 row per second for 60 hours. Or whatever processing you're measuring finishes more quickly than you expected and you only have 30 hours of data.