Datastore with overlapping read function

4 views (last 30 days)
ROSEMARIE MURRAY
ROSEMARIE MURRAY on 23 Apr 2022
Commented: Matt J on 27 Apr 2022
I have some data stored in 20 spreadsheets. Each spreadsheet has a size of about 4000x200. I want to store the data in a datastore, and feed it into a temporal-CNN in chunks of 100 rows. However, I want the rows to overlap. For example, the first value the datastore will return is a 100x200 array, which corresponds to rows 1:101 in the spreadsheet. The second value it should return is rows 2:102, then 3:103, etc.
The only way that I can seem to do this right now is to read all the spreadsheets into a 80,000x200 array in matlab, and then create a 3D array with size 79,900x100x200, and then use a for loop to iterate through the array, copying over chunks of 100x200 from the 2d array into the 3d array. Finally, I put the 3d array into an arrayDatastore. However, this seems really inefficient, and I have to keep the batch size for the CNN pretty small to avoid errors.
I also tried saving each of the 100x200 arrays into a grayscale png, and then creating an imageDatastore with the 79,900 images. This lets me have a larger batch size, but A) it takes about 8 hours to convert all the data into images, and B) training the CNN takes about 8-10 times longer (4-5 hours instead of 30 mins).
Is there a better way to do this?

Answers (1)

Matt J
Matt J on 23 Apr 2022
Edited: Matt J on 23 Apr 2022
For example, the first value the datastore will return is a 100x200 array, which corresponds to rows 1:101 in the spreadsheet. The second value it should return is rows 2:102, then 3:103, etc.
An equivalent to that would be to make the CNN fully convolutional (if it isn't already) with input size 4000x200. Then, you could feed an entire spreadsheet as input at once.
  11 Comments
ROSEMARIE MURRAY
ROSEMARIE MURRAY on 26 Apr 2022
Oh, I just took out most of the layers so that it would be easy to see the input layer and the fully-connected layer in one screenshot. I am using an architecture very similar to this example: https://www.mathworks.com/help/deeplearning/ug/hand-gesture-classification-using-radar-signals-and-deep-learning.html
It seems that it is not possible to create a network without an input layer. I can make the network with an imageinputlayer of size 4000x200, train it, and then remove the input layer and replace it with an imageInputLayer with size 100x200.
I finally got to successful training with the simplified layer structure from my previous screenshots, but when I added back in more layers of convolution and pooling, I realized I need to to really be careful setting the filter pooling sizes and and padding and stride length to ensure that the time dimension only shrinks from 4000 to 3901. And I think that means that when I change the input size, it might not necessarily shrink the time dimension from 100 to 1, so I have to be really careful on the settings to try to achieve both goals.
To be honest, I think I will just go back to my old method of copying the data into a 3d array, making an array datastore, and just deal with a small batch size to avoid memory errors. I think having more flexibility with the filter sizes and other parameters is worth it. Thank you for all your help, though.
Matt J
Matt J on 27 Apr 2022
It seems that it is not possible to create a network without an input layer. I can make the network with an imageinputlayer of size 4000x200
You would have to turn off the normalizations that it is doing, in that case. The imageInputLayer is not a convolutional layer you can't get shift-invariant output if normalization is happening.

Sign in to comment.

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!