Main Content


Detect objects using R-CNN deep learning detector


The rcnnObjectDetector object detects objects from an image, using a R-CNN (region-based convolutional neural networks) object detector. To detect objects in an image, pass the trained detector to the detect function. To classify image regions, pass the detector to the classifyRegions function.

Use of the rcnnObjectDetector requires Statistics and Machine Learning Toolbox™ and Deep Learning Toolbox™.

When using the detect or classifyRegions functions with rcnnObjectDetector, use of a CUDA® enabled NVIDIA® GPU is highly recommended. The GPU reduces computation time significantly. Usage of the GPU requires Parallel Computing Toolbox™. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).


Create a rcnnObjectDetector object by calling the trainRCNNObjectDetector function with training data (requires Deep Learning Toolbox).

detector = trainRCNNObjectDetector(trainingData,...)


expand all

Convolutional neural network (CNN) used within the R-CNN detector, specified as a SeriesNetwork (Deep Learning Toolbox) or DAGNetwork (Deep Learning Toolbox) object.

Custom region proposal method, specified as a function name. A custom function proposalFcn must have the following functional form:

 [bboxes,scores] = proposalFcn(I)

The input argument I is an image. The function must return rectangular bounding boxes in an M-by-4 array. Each row of bboxes contains a four-element vector, [x,y,width,height], that specifies the upper–left corner and size of a bounding box in pixels. The function must also return a score for each bounding box in an M-by-1 vector. Higher scores indicate that the bounding box is more likely to contain an object.

Object class names, specified as a cell array. The array contains the names of the object classes the R-CNN detector was trained to find.

This property is read-only.

Bounding box regression layer name, specified as a character vector. This property is set during training using the BoxRegressionLayer argument of trainRCNNObjectDetector.

Object Functions

detectDetect objects using R-CNN deep learning detector
classifyRegionsClassify objects in image regions using R-CNN object detector


collapse all

Load training data and network layers.

load('rcnnStopSigns.mat', 'stopSigns', 'layers')

Add the image directory to the MATLAB path.

imDir = fullfile(matlabroot, 'toolbox', 'vision', 'visiondata',...

Set network training options to use mini-batch size of 32 to reduce GPU memory usage. Lower the InitialLearnRate to reduce the rate at which network parameters are changed. This is beneficial when fine-tuning a pre-trained network and prevents the network from changing too rapidly.

options = trainingOptions('sgdm', ...
  'MiniBatchSize', 32, ...
  'InitialLearnRate', 1e-6, ...
  'MaxEpochs', 10);

Train the R-CNN detector. Training can take a few minutes to complete.

rcnn = trainRCNNObjectDetector(stopSigns, layers, options, 'NegativeOverlapRange', [0 0.3]);
Training an R-CNN Object Detector for the following object classes:

* stopSign

--> Extracting region proposals from 27 training images...done.

--> Training a neural network to classify objects in training data...

Training on single CPU.
Initializing input data normalization.
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|       1 |           1 |       00:00:01 |       96.88% |       0.1651 |      1.0000e-06 |
|       2 |          50 |       00:00:15 |       96.88% |       0.0807 |      1.0000e-06 |
|       3 |         100 |       00:00:27 |       96.88% |       0.1340 |      1.0000e-06 |
|       5 |         150 |       00:00:38 |       96.88% |       0.0225 |      1.0000e-06 |
|       6 |         200 |       00:00:49 |       93.75% |       0.6584 |      1.0000e-06 |
|       8 |         250 |       00:01:00 |       93.75% |       0.5233 |      1.0000e-06 |
|       9 |         300 |       00:01:11 |      100.00% |   2.9456e-05 |      1.0000e-06 |
|      10 |         350 |       00:01:23 |      100.00% |       0.0009 |      1.0000e-06 |
Training finished: Max epochs completed.

Network training complete.

--> Training bounding box regression models for each object class...100.00%...done.

Detector training complete.

Test the R-CNN detector on a test image.

img = imread('stopSignTest.jpg');

[bbox, score, label] = detect(rcnn, img, MiniBatchSize=32);

Display strongest detection result.

[score, idx] = max(score);

bbox = bbox(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

detectedImg = insertObjectAnnotation(img, 'rectangle', bbox, annotation);


Remove the image directory from the path.


Resume training an R-CNN object detector using additional data. To illustrate this procedure, half the ground truth data will be used to initially train the detector. Then, training is resumed using all the data.

Load training data and initialize training options.

load('rcnnStopSigns.mat', 'stopSigns', 'layers')

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...

options = trainingOptions('sgdm', ...
    'MiniBatchSize', 32, ...
    'InitialLearnRate', 1e-6, ...
    'MaxEpochs', 10, ...
    'Verbose', false);

Train the R-CNN detector with a portion of the ground truth.

rcnn = trainRCNNObjectDetector(stopSigns(1:10,:), layers, options, 'NegativeOverlapRange', [0 0.3]);

Get the trained network layers from the detector. When you pass in an array of network layers to trainRCNNObjectDetector, they are used as-is to continue training.

network = rcnn.Network;
layers = network.Layers;

Resume training using all the training data.

rcnnFinal = trainRCNNObjectDetector(stopSigns, layers, options);

Create an R-CNN object detector for two object classes: dogs and cats.

objectClasses = {'dogs','cats'};

The network must be able to classify both dogs, cats, and a "background" class in order to be trained using trainRCNNObjectDetector. In this example, a one is added to include the background.

numClassesPlusBackground = numel(objectClasses) + 1;

The final fully connected layer of a network defines the number of classes that the network can classify. Set the final fully connected layer to have an output size equal to the number of classes plus a background class.

layers = [ ...
    imageInputLayer([28 28 1])

These network layers can now be used to train an R-CNN two-class object detector.

Create an R-CNN object detector and set it up to use a saved network checkpoint. A network checkpoint is saved every epoch during network training when the trainingOptions 'CheckpointPath' parameter is set. Network checkpoints are useful in case your training session terminates unexpectedly.

Load the stop sign training data.


Add full path to image files.

stopSigns.imageFilename = fullfile(toolboxdir('vision'),'visiondata', ...

Set the 'CheckpointPath' using the trainingOptions function.

checkpointLocation = tempdir;
options = trainingOptions('sgdm','Verbose',false, ...

Train the R-CNN object detector with a few images.

rcnn = trainRCNNObjectDetector(stopSigns(1:3,:),layers,options);

Load a saved network checkpoint.

wildcardFilePath = fullfile(checkpointLocation,'convnet_checkpoint__*.mat');
contents = dir(wildcardFilePath);

Load one of the checkpoint networks.

filepath = fullfile(contents(1).folder,contents(1).name);
checkpoint = load(filepath);
ans = 

  SeriesNetwork with properties:

    Layers: [15×1 nnet.cnn.layer.Layer]

Create a new R-CNN object detector and set it up to use the saved network.

rcnnCheckPoint = rcnnObjectDetector();
rcnnCheckPoint.RegionProposalFcn = @rcnnObjectDetector.proposeRegions;

Set the Network to the saved network checkpoint.

rcnnCheckPoint.Network =
rcnnCheckPoint = 

  rcnnObjectDetector with properties:

              Network: [1×1 SeriesNetwork]
           ClassNames: {'stopSign'  'Background'}
    RegionProposalFcn: @rcnnObjectDetector.proposeRegions


[1] Girshick, Ross, et al. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2014, pp. 580–87. (Crossref),

[2] Girshick, Ross. “Fast R-CNN.” 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 1440–48. (Crossref),

Version History

Introduced in R2016b