Main Content

posemaskrcnn

Predict object pose using Pose Mask R-CNN pose estimation

Since R2024a

Description

The posemaskrcnn object performs pose estimation of objects in an image using a pretrained Pose Mask R-CNN network, a region-based convolutional neural network designed for six degrees-of-freedom (6-DoF) pose estimation.

Note

This functionality requires Deep Learning Toolbox™ and the Computer Vision Toolbox™ Model for Pose Mask R-CNN 6-DoF Object Pose Estimation. You can install the Computer Vision Toolbox Model for Pose Mask R-CNN 6-DoF Object Pose Estimation from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

Creation

Description

net = posemaskrcnn(pretrainedNet) creates a pretrained Pose Mask R-CNN pose estimation network net by using the specified pretrained Pose Mask R-CNN deep learning network.

example

net = posemaskrcnn(pretrainedNet,classNames) creates a pretrained Pose Mask R-CNN pose estimation network, and configures it to perform transfer learning using a set of object classes specified by the classNames argument. For optimal results, train the network on new training data before performing pose estimation.

net = posemaskrcnn(pretrainedNet,classNames,anchorBoxes) creates a pretrained Pose Mask R-CNN network, and configures it to perform transfer learning using a set of object classes and anchor boxes specified by the classNames and anchorBoxes arguments, respectively. For optimal results, train the network on new training data before performing pose estimation.

net = posemaskrcnn(___,Name=Value) specifies options using name-value arguments to specify ROI pooling sizes in addition to any combination of input arguments from previous syntaxes. You can also use name-value arguments to set the ModelName and ImageInputSize properties.

For example, PoolSize=[11 11] specifies the ROI pooling size for the detection head as 11-by-11 pixels.

Input Arguments

expand all

Pretrained Pose Mask R-CNN deep learning network, specified as one of these:

  • "resnet50-pvc-parts" — A pretrained Pose Mask R-CNN deep learning network trained on images of PVC pipe connectors in various orientations. You can use the resulting pose estimation network to perform pose estimation on a custom bin-picking data set by using the predictPose object function.

  • "resnet50-coco" — A pretrained Pose Mask R-CNN deep learning network which uses weights from a Mask R-CNN deep learning network with ResNet-50 as a backbone, trained on the COCO data set for instance segmentation. To train a pose estimation network created using this pretrained deep learning network, you must use the trainPoseMaskRCNN function in two stages: first with a trainingMode argument value of "mask", and then with a trainingMode value of "pose-and-mask".

Data Types: char | string

Names of the object classes that the Pose Mask R-CNN network is trained to estimate pose for, specified as a vector of strings, cell array of character vectors, or categorical vector. This argument sets the ClassNames property of the posemaskrcnn object.

Data Types: string | cell | categorical

Sizes of the anchor boxes, specified as an M-by-2 matrix, where each row is of the form [height width], in pixels. M is the number of anchor boxes. You can estimate the anchor boxes using the estimateAnchorBoxes function. This argument sets the AnchorBoxes property of the posemaskrcnn object.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: posemaskrcnn("resnet50-pvc-parts",classNames,anchorBoxes,PoolSize=[11 11]) specifies the ROI pooling size for the detection head as 11-by-11 pixels.

ROI pooling size for the detection head, specified as a 2-element row vector of the form [height width]. Adjust the PoolSize value when your data set contains extreme variation in the scale of objects or images.

ROI pooling size for the mask segmentation head, specified as a 2-element row vector of the form [height width]. Adjust the MaskPoolSize value when your data contains extreme variation in the scale in the objects or images.

Properties

expand all

This property is read-only.

Name of the pretrained Pose Mask R-CNN pose estimation network, specified as a string scalar or character vector.

To set the value of this property, you must specify it at object creation.

This property is read-only.

Sizes of the anchor boxes, specified as an M-by-2 matrix where each row is of the form [height width], in pixels. The default value of this property depends on the specified value of pretrainedNet:

  • "resnet50-pvc-parts" — An 8-by-2 matrix of anchor box values.

  • "resnet50-coco" — A 15-by-2 matrix of anchor box values.

When you specify anchor boxes, the posemaskrcnn object reinitializes the final convolution layers in the region proposal subnetwork to the correct size based on the number of anchor boxes M.

The anchorBoxes argument sets this property.

This property is read-only.

Names of the object classes that the Pose Mask R-CNN network is trained to estimate pose for, specified as a vector of strings, cell array of character vectors, or categorical vector. The default value of this property depends on the specified value of pretrainedNet:

  • "resnet50-pvc-parts" — Four object class names that correspond to the four object orientations in the PVC pipe data set.

  • "resnet50-coco" — Eighty object class names that correspond to various objects detected in the COCO data set.

When you specify class names, the posemaskrcnn object reinitializes the final convolution layers in the detection head and mask segmentation head to the correct size based on the number of classes.

The classNames argument sets this property.

This property is read-only.

Image size to use for pose estimation, specified as a 1-by-3 vector of positive integers of the form [height width channels]. The values of height and width specify the image dimensions, in pixels, and channels specifies the number of color channels. The network resizes input images to this size while maintaining the aspect ratio. The default value is the network input size.

To set the value of this property, you must specify it at object creation.

This property is read-only.

Depth image size used for pose estimation, specified as a 3-element row vector of positive integers of the form [height width 1].

Object Functions

predictPoseEstimate object pose using Pose Mask R-CNN deep learning network

Examples

collapse all

Create a pretrained Pose Mask R-CNN pose estimation network that was trained on images of PVC pipe connectors in various orientations.

net = posemaskrcnn("resnet50-pvc-parts")
net = 
  posemaskrcnn with properties:

              ModelName: 'posemaskrcnn'
             ClassNames: {4×1 cell}
         ImageInputSize: [720 1280 3]
    DepthImageInputSize: [720 1280 1]
            AnchorBoxes: [8×2 double]

Display the names of the object classes.

net.ClassNames
ans = 4×1 cell
    {'I_shape'}
    {'X_shape'}
    {'L_shape'}
    {'T_shape'}

References

[1] Xiang, Yu, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. "PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes." ArXiv abs/1711.00199 (2017). https://api.semanticscholar.org/CorpusID:3440950.

[2] Jiang, Xiaoke, Donghai Li, Hao Chen, Ye Zheng, Rui Zhao, and Liwei Wu. “Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11164–74. New Orleans, LA, USA: IEEE, 2022. doi:10.1109/CVPR52688.2022.01089.

Version History

Introduced in R2024a