Main Content

Datastores for Deep Learning

Datastores in MATLAB® are a convenient way of working with and representing collections of data that are too large to fit in memory at one time. Because deep learning often requires large amounts of data, datastores are an important part of the deep learning workflow in MATLAB.

Select Datastore

For many applications, the easiest approach is to start with a built-in datastore. For more information about the available built-in datastores, see Select Datastore for File Format or Application. However, only some types of built-in datastores can be used directly as input for network training, validation, and inference. These datastores are:

DatastoreDescriptionAdditional Toolbox Required
ImageDatastoreDatastore for image datanone
AugmentedImageDatastore

Datastore for resizing and augmenting training images

Datastore is nondeterministic

none
PixelLabelDatastore (Computer Vision Toolbox)

Datastore for pixel label data

Computer Vision Toolbox™

boxLabelDatastore (Computer Vision Toolbox)

Datastore for bounding box label data

Computer Vision Toolbox

RandomPatchExtractionDatastore (Image Processing Toolbox)

Datastore for extracting random patches from image-based data

Datastore is nondeterministic

Image Processing Toolbox™

blockedImageDatastore (Image Processing Toolbox)Datastore for blockwise reading and processing of image data, including large images that do not fit in memoryImage Processing Toolbox
blockedPointCloudDatastore (Lidar Toolbox)Datastore for blockwise reading and processing of point cloud data, including large point clouds that do not fit in memoryLidar Toolbox™
DenoisingImageDatastore (Image Processing Toolbox)

Datastore to train an image denoising deep neural network

Datastore is nondeterministic

Image Processing Toolbox

audioDatastore (Audio Toolbox)Datastore for audio dataAudio Toolbox™
signalDatastore (Signal Processing Toolbox)Datastore for signal dataSignal Processing Toolbox™

Other built-in datastores can be used as input for deep learning, but the data read from these datastores must be preprocessed into a format required by a deep learning network. For more information on the required format of read data, see Input Datastore for Training, Validation, and Inference. For more information on how to preprocess data read from datastores, see Transform and Combine Datastores.

For some applications, there may not be a built-in datastore type that fits your data well. For these problems, you can create a custom datastore. For more information, see Develop Custom Datastore. All custom datastores are valid inputs to deep learning interfaces as long as the read function of the custom datastore returns data in the required form.

Input Datastore for Training, Validation, and Inference

Datastores are valid inputs in Deep Learning Toolbox™ for training, validation, and inference.

Training and Validation

You can use an image datastore or other types of datastore as a source of training data when training using the trainnet or trainNetwork function. To use a datastore for validation, use the ValidationData name-value argument in the trainingOptions function.

To be a valid input for training or validation, the read function of a datastore must return data as either a cell array or a table (with the exception of ImageDatastore objects which can output numeric arrays and custom mini-batch datastores which must output tables).

For networks with a single input, the table or cell array returned by the datastore must have two columns. The first column of data represents inputs to the network and the second column of data represents responses. Each row of data represents a separate observation. For ImageDatastore only, trainnet, trainNetwork and trainingOptions support data returned as integer arrays and single-column cell array of integer arrays.

To use a datastore for networks with multiple input layers, use the combine and transform functions to create a datastore that outputs a cell array with (numInputs + 1) columns, where numInputs is the number of network inputs. In this case, the first numInputs columns specify the predictors for each input and the last column specifies the responses. The order of inputs is given by the InputNames property of the layer graph layers.

The following table shows example outputs of calling the read function for datastore ds.

Neural Network ArchitectureDatastore OutputExample Output
Single input layer

Table or cell array with two columns.

The first and second columns specify the predictors and targets, respectively.

Table elements must be scalars, row vectors, or 1-by-1 cell arrays containing a numeric array.

Custom mini-batch datastores must output tables.

Table for neural network with one input and one output:

data = read(ds)
data =

  4×2 table

        Predictors        Response
    __________________    ________

    {224×224×3 double}       2    
    {224×224×3 double}       7    
    {224×224×3 double}       9    
    {224×224×3 double}       9  

Cell array for neural network with one input and one output:

data = read(ds)
data =

  4×2 cell array

    {224×224×3 double}    {[2]}
    {224×224×3 double}    {[7]}
    {224×224×3 double}    {[9]}
    {224×224×3 double}    {[9]}

Multiple input layers

Cell array with (numInputs + 1) columns, where numInputs is the number of neural network inputs.

The first numInputs columns specify the predictors for each input and the last column specifies the targets.

The order of inputs is given by the InputNames property of the layer graph layers.

Cell array for neural network with two inputs and one output.

data = read(ds)
data =

  4×3 cell array

    {224×224×3 double}    {128×128×3 double}    {[2]}
    {224×224×3 double}    {128×128×3 double}    {[2]}
    {224×224×3 double}    {128×128×3 double}    {[9]}
    {224×224×3 double}    {128×128×3 double}    {[9]}

The format of the predictors depend on the type of data.

DataFormat of Predictors
2-D image

h-by-w-by-c numeric array, where h, w, and c are the height, width, and number of channels of the image, respectively.

3-D image

h-by-w-by-d-by-c numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the image, respectively.

Vector sequence

c-by-s matrix, where c is the number of features of the sequence and s is the sequence length.

1-D image sequence

h-by-c-by-s array, where h and c correspond to the height and number of channels of the image, respectively, and s is the sequence length.

Each sequence in the mini-batch must have the same sequence length.

2-D image sequence

h-by-w-by-c-by-s array, where h, w, and c correspond to the height, width, and number of channels of the image, respectively, and s is the sequence length.

Each sequence in the mini-batch must have the same sequence length.

3-D image sequence

h-by-w-by-d-by-c-by-s array, where h, w, d, and c correspond to the height, width, depth, and number of channels of the image, respectively, and s is the sequence length.

Each sequence in the mini-batch must have the same sequence length.

Features

c-by-1 column vector, where c is the number of features.

For predictors returned in tables, the elements must contain a numeric scalar, a numeric row vector, or a 1-by-1 cell array containing a numeric array.

The trainNetwork function does not support networks with multiple sequence input layers.

The format of the responses depend on the type of task.

TaskFormat of Responses
ClassificationCategorical scalar
Regression

  • Scalar

  • Numeric vector

  • 3-D numeric array representing an image

Sequence-to-sequence classification

1-by-s sequence of categorical labels, where s is the sequence length of the corresponding predictor sequence.

Sequence-to-sequence regression

R-by-s matrix, where R is the number of responses and s is the sequence length of the corresponding predictor sequence.

For responses returned in tables, the elements must be a categorical scalar, a numeric scalar, a numeric row vector, or a 1-by-1 cell array containing a numeric array.

Prediction

For inference using predict, classify, and activations, a datastore is only required to yield the columns corresponding to the predictors. The inference functions use the first NumInputs columns and ignores the subsequent layers, where NumInputs is the number of network input layers.

Specify Read Size and Mini-Batch Size

A datastore may return any number of rows (observations) for each call to read. Functions such as trainnet, trainNetwork, predict, classify, and activations that accept datastores and support specifying a 'MiniBatchSize' call read as many times as is necessary to form complete mini-batches of data. As these functions form mini-batches, they use internal queues in memory to store read data. For example, if a datastore consistently returns 64 rows per call to read and MiniBatchSize is 128, then to form each mini-batch of data requires two calls to read.

For best runtime performance, it is recommended to configure datastores such that the number of observations returned by read is equal to the 'MiniBatchSize'. For datastores that have a 'ReadSize' property, set the 'ReadSize' to change the number of observations returned by the datastore for each call to read.

Transform and Combine Datastores

Deep learning frequently requires the data to be preprocessed and augmented before data is in an appropriate form to input to a network. The transform and combine functions of datastore are useful in preparing data to be fed into a network.

To use a datastore for networks with multiple input layers, use the combine and transform functions to create a datastore that outputs a cell array with (numInputs + 1) columns, where numInputs is the number of network inputs. In this case, the first numInputs columns specify the predictors for each input and the last column specifies the responses. The order of inputs is given by the InputNames property of the layer graph layers.

Transform Datastores

A transformed datastore applies a particular data transformation to an underlying datastore when reading data. To create a transformed datastore, use the transform function and specify the underlying datastore and the transformation.

  • For complex transformations involving several preprocessing operations, define the complete set of transformations in your own function. Then, specify a handle to your function as the @fcn argument of transform. For more information, see Create Functions in Files.

  • For simple transformations that can be expressed in one line of code, you can specify a handle to an anonymous function as the @fcn argument of transform. For more information, see Anonymous Functions.

The function handle provided to transform must accept input data in the same format as returned by the read function of the underlying datastore.

 Example: Transform Image Datastore to Train Digit Classification Network

Combine Datastores

The combine function associates multiple datastores. Operating on the resulting CombinedDatastore, such as resetting the datastore, performs the same operation on all of the underlying datastores. Calling the read function of a combined datastore reads one batch of data from all of the N underlying datastores, which must return the same number of observations. Reading from a combined datastore returns the horizontally concatenated results in an N-column cell array that is suitable for training and validation. Shuffling a combined datastore results in an identical randomized ordering of files in the underlying datastores.

For example, if you are training an image-to-image regression network, then you can create the training data set by combining two image datastores. This sample code demonstrates combining two image datastores named imdsX and imdsY. The combined datastore imdsTrain returns data as a two-column cell array.

imdsX = imageDatastore(___);
imdsY = imageDatastore(___);
imdsTrain = combine(imdsX,imdsY)
imdsTrain = 

  CombinedDatastore with properties:

    UnderlyingDatastores: {1×2 cell}

If you have Image Processing Toolbox, then the randomPatchExtractionDatastore (Image Processing Toolbox) provides an alternate solution to associating image-based data in ImageDatastore, PixelLabelDatastore, and TransformedDatastore objects. A randomPatchExtractionDatastore has several advantages over associating data using the combine function. Specifically, a random patch extraction datastore:

  • Provides an easy way to extract patches from both 2-D and 3-D data without requiring you to implement a custom cropping operation using transform and combine

  • Provides an easy way to generate multiple patches per image per mini-batch without requiring you to define a custom concatenation operation using transform.

  • Supports efficient conversion between categorical and numeric data when applying image transforms to categorical data

  • Supports parallel training

  • Improves performance by caching images

Use Datastore for Parallel Training and Background Dispatching

Parallel Training

Specify parallel or multi-GPU training using the ExecutionEnvironment name-value argument of trainingOptions. Training in parallel or using single or multiple GPUs requires Parallel Computing Toolbox™.

Many built-in datastores already support parallel and multi-GPU training. Using the transform and combine functions with built-in datastores frequently maintains support for parallel and multi-GPU training.

If you need to create a custom datastore that supports parallel or multi-GPU training, your datastore should implement the matlab.io.datastore.Subsettable class.

To use a datastore for parallel training or multi-GPU training, it must be subsettable or partitionable. To determine if a datastore is subsettable or partitionable, use the functions isSubsettable and isPartitionable respectively.

When training in parallel, datastores do not support specifying the Shuffle name-value argument of trainingOptions as "never".

Background Dispatch

Background dispatch uses parallel workers to fetch and preprocess training data from a datastore during training. As shown in the following diagram, fetching, preprocessing, and performing training computations in serial can result in downtime where your GPU (or other hardware) utilization is low. Using parallel workers to fetch and preprocess the next batch of training data while your GPU is processing the current batch can increase hardware utilization, resulting in faster training. Use background dispatch when your training data requires significant preprocessing, for example if you are manipulating large images.

Schematic diagram comparing the GPU utilization when training a network with and without using background dispatching when significant preprocessing of the training data is required. Without background dispatch, the GPU is utilized sporadically and with background dispatch the GPU is utilized more consistently.

To use background dispatch, do one of the following:

  • For built-in training, set the DispatchInBackground option to true using the trainingOptions function.

  • For custom training loops, set the DispatchInBackground property of your minibatchqueue to true.

Background dispatching requires Parallel Computing Toolbox.

Datastores that are subsettable or partitionable support reading training data using background dispatching. Background dispatch is supported for local parallel pools only.

See Also

| | | |

Related Examples

More About