unitGenerator

Create unsupervised image-to-image translation (UNIT) generator network

Since R2021a

Syntax

net = unitGenerator(inputSizeSource)

net = unitGenerator(inputSizeSource,Name=Value)

Description

net = unitGenerator(inputSizeSource) creates a UNIT generator network, net, for input of size inputSizeSource. For more information about the network architecture, see UNIT Generator Network. The network has two inputs and four outputs:

The two network inputs are images in the source and target domains. By default, the target image size is same as source image size. You can change the number of channels in the target image by specifying the NumTargetInputChannels name-value argument.
Two of the network outputs are self-reconstructed outputs, in other words, source-to-source and target-to-target translated images. The other two network outputs are the source-to-target and target-to-source translated images.

This function requires Deep Learning Toolbox™.

example

net = unitGenerator(inputSizeSource,Name=Value) modifies aspects of the UNIT generator network using name-value arguments.

example

Examples

collapse all

Create UNIT Generator

This example uses:

Open Live Script

Specify the network input size for RGB images of size 128-by-128.

inputSize = [128 128 3];

Create a UNIT generator that generates RGB images of the input size.

net = unitGenerator(inputSize);

Display the network.

analyzeNetwork(net)

Create UNIT Generator with Five Residual Blocks

This example uses:

Open Live Script

Specify the network input size for RGB images of size 128-by-128 pixels.

inputSize = [128 128 3];

Create a UNIT generator with five residual blocks, three of which are shared between the encoder and decoder modules.

net = unitGenerator(inputSize,NumResidualBlocks=5, ...
   NumSharedBlocks=3);

Display the network.

analyzeNetwork(net)

Analysis for dlnetwork usage of network net in the Deep Learning Network Analyzer app.

Input Arguments

collapse all

`inputSizeSource` — Input size of source image
3-element vector of positive integers

Input size of the source image, specified as a 3-element vector of positive integers. inputSizeSource has the form [H W C], where H is the height, W is the width, and C is the number of channels. The length of each dimension must be evenly divisible by 2^NumDownsamplingBlocks.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: net = unitGenerator(inputSizeSource,NumDownsamplingBlocks=3) creates a network with three downsampling blocks.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: net = unitGenerator(inputSizeSource,"NumDownsamplingBlocks",3) creates a network with three downsampling blocks.

`NumDownsamplingBlocks` — Number of downsampling blocks
`2` (default) | positive integer

Number of downsampling blocks in the source encoder and target encoder subnetworks, specified as a positive integer. In total, the encoder module downsamples the source and target input by a factor of 2^NumDownsamplingBlocks. The source decoder and target decoder subnetworks have the same number of upsampling blocks.

`NumResidualBlocks` — Number of residual blocks
`5` (default) | positive integer

Number of residual blocks in the encoder module, specified as a positive integer. The decoder module has the same number of residual blocks.

`NumSharedBlocks` — Number of shared residual blocks
`2` (default) | positive integer

Number of residual blocks in the shared encoder subnetwork, specified as a positive integer. The shared decoder subnetwork has the same number of residual blocks. The network should contain at least one shared residual block.

`NumTargetChannels` — Number of channels in target image
positive integer

Number of channels in the target image, specified as a positive integer. By default, NumTargetChannels is the same as the number of channels in the source image, inputSizeSource.

`NumFiltersInFirstBlock` — Number of filters in first convolution layer
`64` (default) | positive even integer

Number of filters in the first convolution layer, specified as a positive even integer.

`FilterSizeInFirstAndLastBlocks` — Filter size in first and last convolution layers
`7` (default) | positive odd integer | 2-element vector of positive odd integers

Filter size in the first and last convolution layers of the network, specified as a positive odd integer or 2-element vector of positive odd integers of the form [height width]. When you specify the filter size as a scalar, the filter has equal height and width.

`FilterSizeInIntermediateBlocks` — Filter size in intermediate convolution layers
`3` (default) | 2-element vector of positive odd integers | positive odd integer

Filter size in intermediate convolution layers, specified as a positive odd integer or 2-element vector of positive odd integers of the form [height width]. The intermediate convolution layers are the convolution layers excluding the first and last convolution layer. When you specify the filter size as a scalar, the filter has identical height and width.

`ConvolutionPaddingValue` — Style of padding
`"symmetric-exclude-edge"` (default) | `"symmetric-include-edge"` | `"replicate"` | numeric scalar

Style of padding used in the network, specified as one of these values.

`PaddingValue`	Description	Example
Numeric scalar	Pad with the specified numeric value	$[\begin{matrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{matrix}] \to [\begin{matrix} 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 2 & 3 & 1 & 4 & 2 & 2 \\ 2 & 2 & 1 & 5 & 9 & 2 & 2 \\ 2 & 2 & 2 & 6 & 5 & 2 & 2 \\ 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 & 2 & 2 & 2 \end{matrix}]$
`"symmetric-include-edge"`	Pad using mirrored values of the input, including the edge values	$[\begin{matrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{matrix}] \to [\begin{matrix} 5 & 1 & 1 & 5 & 9 & 9 & 5 \\ 1 & 3 & 3 & 1 & 4 & 4 & 1 \\ 1 & 3 & 3 & 1 & 4 & 4 & 1 \\ 5 & 1 & 1 & 5 & 9 & 9 & 5 \\ 6 & 2 & 2 & 6 & 5 & 5 & 6 \\ 6 & 2 & 2 & 6 & 5 & 5 & 6 \\ 5 & 1 & 1 & 5 & 9 & 9 & 5 \end{matrix}]$
`"symmetric-exclude-edge"`	Pad using mirrored values of the input, excluding the edge values	$[\begin{matrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{matrix}] \to [\begin{matrix} 5 & 6 & 2 & 6 & 5 & 6 & 2 \\ 9 & 5 & 1 & 5 & 9 & 5 & 1 \\ 4 & 1 & 3 & 1 & 4 & 1 & 3 \\ 9 & 5 & 1 & 5 & 9 & 5 & 1 \\ 5 & 6 & 2 & 6 & 5 & 6 & 2 \\ 9 & 5 & 1 & 5 & 9 & 5 & 1 \\ 4 & 1 & 3 & 1 & 4 & 1 & 3 \end{matrix}]$
`"replicate"`	Pad using repeated border elements of the input	$[\begin{matrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{matrix}] \to [\begin{matrix} 3 & 3 & 3 & 1 & 4 & 4 & 4 \\ 3 & 3 & 3 & 1 & 4 & 4 & 4 \\ 3 & 3 & 3 & 1 & 4 & 4 & 4 \\ 1 & 1 & 1 & 5 & 9 & 9 & 9 \\ 2 & 2 & 2 & 6 & 5 & 5 & 5 \\ 2 & 2 & 2 & 6 & 5 & 5 & 5 \\ 2 & 2 & 2 & 6 & 5 & 5 & 5 \end{matrix}]$

`UpsampleMethod` — Method used to upsample activations
`"transposedConv"` (default) | `"bilinearResize"` | `"pixelShuffle"`

Method used to upsample activations, specified as one of these values:

"transposedConv" — Use a transposedConv2dLayer (Deep Learning Toolbox) with a stride of [2 2].
"bilinearResize" — Use a convolution2dLayer (Deep Learning Toolbox) with a stride of [1 1] followed by a resize2dLayer with a scale of [2 2].
"pixelShuffle" — Use a convolution2dLayer (Deep Learning Toolbox) with a stride of [1 1] followed by a depthToSpace2dLayer with a block size of [2 2].

Data Types: char | string

`ConvolutionWeightsInitializer` — Weight initialization used in convolution layers
`"he"` (default) | `"glorot"` | `"narrow-normal"` | function

Weight initialization used in convolution layers, specified as "glorot", "he", "narrow-normal", or a function handle. For more information, see Specify Custom Weight Initialization Function (Deep Learning Toolbox).

`ActivationLayer` — Activation function
`"relu"` (default) | `"leakyRelu"` | `"elu"` | layer object

Activation function to use in the network except after the first and final convolution layers, specified as one of these values. The unitGenerator function automatically adds a leaky ReLU layer after the first convolution layer. For more information and a list of available layers, see Activation Layers (Deep Learning Toolbox).

"relu" — Use a reluLayer (Deep Learning Toolbox)
"leakyRelu" — Use a leakyReluLayer (Deep Learning Toolbox) with a scale factor of 0.2
"elu" — Use an eluLayer (Deep Learning Toolbox)
A layer object

`SourceFinalActivationLayer` — Activation function after final convolution in source decoder
`"tanh"` (default) | `"sigmoid"` | `"softmax"` | `"none"` | layer object

Activation function after the final convolution layer in the source decoder, specified as one of these values. For more information and a list of available layers, see Activation Layers (Deep Learning Toolbox).

"tanh" — Use a tanhLayer (Deep Learning Toolbox)
"sigmoid" — Use a sigmoidLayer (Deep Learning Toolbox)
"softmax" — Use a softmaxLayer (Deep Learning Toolbox)
"none" — Do not use a final activation layer
A layer object

`TargetFinalActivationLayer` — Activation function after final convolution in target decoder
`"tanh"` (default) | `"sigmoid"` | `"softmax"` | `"none"` | layer object

Activation function after the final convolution layer in the target decoder, specified as one of these values. For more information and a list of available layers, see Activation Layers (Deep Learning Toolbox).

"tanh" — Use a tanhLayer (Deep Learning Toolbox)
"sigmoid" — Use a sigmoidLayer (Deep Learning Toolbox)
"softmax" — Use a softmaxLayer (Deep Learning Toolbox)
"none" — Do not use a final activation layer
A layer object

Output Arguments

collapse all

`net` — UNIT generator network
`dlnetwork` object

UNIT generator network, returned as a dlnetwork (Deep Learning Toolbox) object.

More About

collapse all

UNIT Generator Network

A UNIT generator network consists of three subnetworks in an encoder module followed by three subnetworks in a decoder module. The default network follows the architecture proposed by Liu, Breuel, and Kautz [1].

Inputs, encoder modules, decoder modules, and outputs of a UNIT network.

The encoder module downsamples the input by a factor of 2^NumDownsamplingBlocks. The encoder module consists of three subnetworks.

The source encoder subnetwork, called 'encoderSourceBlock', has an initial block of layers that accepts data in the source domain, X_S. The subnetwork then has NumDownsamplingBlocks downsampling blocks that downsample the data and NumResidualBlocks–NumSharedBlocks residual blocks.
The target encoder subnetwork, called 'encoderTargetBlock', has an initial block of layers that accepts data in the target domain, X_S. The subnetwork then has NumDownsamplingBlocks downsampling blocks that downsample the data, and NumResidualBlocks–NumSharedBlocks residual blocks.
The output of the source encoder and target encoder are combined by a concatenationLayer (Deep Learning Toolbox)
The shared residual encoder subnetwork, called 'encoderSharedBlock', accepts the concatenated data and has NumSharedBlocks residual blocks.

The decoder module consists of three subnetworks that perform a total of NumDownsamplingBlocks upsampling operations on the data.

The shared residual decoder subnetwork, called 'decoderSharedBlock', accepts data from the encoder and has NumSharedBlocks residual blocks.
The source decoder subnetwork, called 'decoderSourceBlock', has NumResidualBlocks–NumSharedBlocks residual blocks, NumDownsamplingBlocks downsampling blocks that downsample the data, and a final block of layers that returns the output. This subnetwork returns two outputs in the source domain: X_TS and X_SS. The output X_TS is an image translated from the target domain to the source domain. The output X_SS is a self-reconstructed image from the source domain to the source domain.
The target decoder subnetwork, called 'decoderTargetBlock', has NumResidualBlocks–NumSharedBlocks residual blocks, NumDownsamplingBlocks downsampling blocks that downsample the data, and a final block of layers that returns the output. This subnetwork returns two outputs in the target domain: X_ST and X_TT. The output X_TS is an image translated from the source domain to the target domain. The output X_TT is a self-reconstructed image from the target domain to the target domain.

The table describes the blocks of layers that comprise the subnetworks.

Block Type	Layers	Diagram of Default Block
Initial block	An `imageInputLayer` (Deep Learning Toolbox). A `convolution2dLayer` (Deep Learning Toolbox) with a stride of [1 1] and a filter size of `FilterSizeInFirstAndLastBlocks`. A `leakyReluLayer` (Deep Learning Toolbox) with a scale factor of 0.2.
Downsampling block	A `convolution2dLayer` (Deep Learning Toolbox) with a stride of [2 2] to perform downsampling. The convolution layer has a filter size of `FilterSizeInIntermediateBlocks`. An `instanceNormalizationLayer` (Deep Learning Toolbox). An activation layer specified by the `ActivationLayer` name-value argument.
Residual block	A `convolution2dLayer` (Deep Learning Toolbox) with a stride of [1 1] and a filter size of `FilterSizeInIntermediateBlocks`. An `instanceNormalizationLayer` (Deep Learning Toolbox). An activation layer specified by the `ActivationLayer` name-value argument. A second `convolution2dLayer` (Deep Learning Toolbox). A second `instanceNormalizationLayer` (Deep Learning Toolbox). An `additionLayer` (Deep Learning Toolbox) that provides a skip connection between every block.
Upsampling block	An upsampling layer that upsamples by a factor of 2 according to the `UpsampleMethod` name-value argument. The convolution layer has a filter size of `FilterSizeInIntermediateBlocks`. An `instanceNormalizationLayer` (Deep Learning Toolbox). An activation layer specified by the `ActivationLayer` name-value argument.
Final block	A `convolution2dLayer` (Deep Learning Toolbox) with a stride of [1 1] and a filter size of `FilterSizeInFirstAndLastBlocks`. An optional activation layer specified by the `SourceFinalActivationLayer` and `TargetFinalActivationLayer` name-value arguments.

Tips

You can create the discriminator network for UNIT by using the patchGANDiscriminator function.
Train the UNIT GAN network using a custom training loop.
To perform domain translation of source image to target image and vice versa, use the unitPredict function.
For shared latent feature encoding, the arguments NumSharedBlocks and NumResidualBlocks must be greater than 0.

References

[1] Liu, Ming-Yu, Thomas Breuel, and Jan Kautz. "Unsupervised Image-to-Image Translation Networks." Advances in Neural Information Processing Systems 30 (NIPS 2017). Long Beach, CA: 2017. https://arxiv.org/abs/1703.00848.

Version History

Introduced in R2021a

unitGenerator

Syntax

Description

Examples

Create UNIT Generator

Create UNIT Generator with Five Residual Blocks

Input Arguments

`inputSizeSource` — Input size of source image
3-element vector of positive integers

Name-Value Arguments

`NumDownsamplingBlocks` — Number of downsampling blocks
`2` (default) | positive integer

`NumResidualBlocks` — Number of residual blocks
`5` (default) | positive integer

`NumSharedBlocks` — Number of shared residual blocks
`2` (default) | positive integer

`NumTargetChannels` — Number of channels in target image
positive integer

`NumFiltersInFirstBlock` — Number of filters in first convolution layer
`64` (default) | positive even integer

`FilterSizeInFirstAndLastBlocks` — Filter size in first and last convolution layers
`7` (default) | positive odd integer | 2-element vector of positive odd integers

`FilterSizeInIntermediateBlocks` — Filter size in intermediate convolution layers
`3` (default) | 2-element vector of positive odd integers | positive odd integer

`ConvolutionPaddingValue` — Style of padding
`"symmetric-exclude-edge"` (default) | `"symmetric-include-edge"` | `"replicate"` | numeric scalar

`UpsampleMethod` — Method used to upsample activations
`"transposedConv"` (default) | `"bilinearResize"` | `"pixelShuffle"`

`ConvolutionWeightsInitializer` — Weight initialization used in convolution layers
`"he"` (default) | `"glorot"` | `"narrow-normal"` | function

`ActivationLayer` — Activation function
`"relu"` (default) | `"leakyRelu"` | `"elu"` | layer object

`SourceFinalActivationLayer` — Activation function after final convolution in source decoder
`"tanh"` (default) | `"sigmoid"` | `"softmax"` | `"none"` | layer object

`TargetFinalActivationLayer` — Activation function after final convolution in target decoder
`"tanh"` (default) | `"sigmoid"` | `"softmax"` | `"none"` | layer object

Output Arguments

`net` — UNIT generator network
`dlnetwork` object

More About

UNIT Generator Network

Tips

References

Version History

See Also

Topics

unitGenerator

Syntax

Description

Examples

Create UNIT Generator

Create UNIT Generator with Five Residual Blocks

Input Arguments

inputSizeSource — Input size of source image 3-element vector of positive integers

Name-Value Arguments

NumDownsamplingBlocks — Number of downsampling blocks 2 (default) | positive integer

NumResidualBlocks — Number of residual blocks 5 (default) | positive integer

NumSharedBlocks — Number of shared residual blocks 2 (default) | positive integer

NumTargetChannels — Number of channels in target image positive integer

NumFiltersInFirstBlock — Number of filters in first convolution layer 64 (default) | positive even integer

FilterSizeInFirstAndLastBlocks — Filter size in first and last convolution layers 7 (default) | positive odd integer | 2-element vector of positive odd integers

FilterSizeInIntermediateBlocks — Filter size in intermediate convolution layers 3 (default) | 2-element vector of positive odd integers | positive odd integer

ConvolutionPaddingValue — Style of padding "symmetric-exclude-edge" (default) | "symmetric-include-edge" | "replicate" | numeric scalar

UpsampleMethod — Method used to upsample activations "transposedConv" (default) | "bilinearResize" | "pixelShuffle"

ConvolutionWeightsInitializer — Weight initialization used in convolution layers "he" (default) | "glorot" | "narrow-normal" | function

ActivationLayer — Activation function "relu" (default) | "leakyRelu" | "elu" | layer object

SourceFinalActivationLayer — Activation function after final convolution in source decoder "tanh" (default) | "sigmoid" | "softmax" | "none" | layer object

TargetFinalActivationLayer — Activation function after final convolution in target decoder "tanh" (default) | "sigmoid" | "softmax" | "none" | layer object

Output Arguments

net — UNIT generator network dlnetwork object

More About

UNIT Generator Network

Tips

References

Version History

See Also

Topics

`inputSizeSource` — Input size of source image
3-element vector of positive integers

`NumDownsamplingBlocks` — Number of downsampling blocks
`2` (default) | positive integer

`NumResidualBlocks` — Number of residual blocks
`5` (default) | positive integer

`NumSharedBlocks` — Number of shared residual blocks
`2` (default) | positive integer

`NumTargetChannels` — Number of channels in target image
positive integer

`NumFiltersInFirstBlock` — Number of filters in first convolution layer
`64` (default) | positive even integer

`FilterSizeInFirstAndLastBlocks` — Filter size in first and last convolution layers
`7` (default) | positive odd integer | 2-element vector of positive odd integers

`FilterSizeInIntermediateBlocks` — Filter size in intermediate convolution layers
`3` (default) | 2-element vector of positive odd integers | positive odd integer

`ConvolutionPaddingValue` — Style of padding
`"symmetric-exclude-edge"` (default) | `"symmetric-include-edge"` | `"replicate"` | numeric scalar

`UpsampleMethod` — Method used to upsample activations
`"transposedConv"` (default) | `"bilinearResize"` | `"pixelShuffle"`

`ConvolutionWeightsInitializer` — Weight initialization used in convolution layers
`"he"` (default) | `"glorot"` | `"narrow-normal"` | function

`ActivationLayer` — Activation function
`"relu"` (default) | `"leakyRelu"` | `"elu"` | layer object

`SourceFinalActivationLayer` — Activation function after final convolution in source decoder
`"tanh"` (default) | `"sigmoid"` | `"softmax"` | `"none"` | layer object

`TargetFinalActivationLayer` — Activation function after final convolution in target decoder
`"tanh"` (default) | `"sigmoid"` | `"softmax"` | `"none"` | layer object

`net` — UNIT generator network
`dlnetwork` object