Gated recurrent unit

The gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data.

This function applies the deep learning GRU operation to `dlarray`

data. If
you want to apply an GRU operation within a `layerGraph`

object
or `Layer`

array, use
the following layer:

applies a gated recurrent unit (GRU) calculation to input `dlY`

= gru(`dlX`

,`H0`

,`weights`

,`recurrentWeights`

,`bias`

)`dlX`

using the
initial hidden state `H0`

, and parameters `weights`

,
`recurrentWeights`

, and `bias`

. The input
`dlX`

is a formatted `dlarray`

with dimension labels.
The output `dlY`

is a formatted `dlarray`

with the same
dimension labels as `dlX`

, except for any `'S'`

dimensions.

The `gru`

function updates the hidden state using the hyperbolic
tangent function (tanh) as the state activation function. The `gru`

function uses the sigmoid function given by $$\sigma (x)={(1+{e}^{-x})}^{-1}$$ as the gate activation function.

`[`

also returns the hidden state after the GRU operation.`dlY`

,`hiddenState`

] = gru(`dlX`

,`H0`

,`weights`

,`recurrentWeights`

,`bias`

)

`[___] = gru(___,'DataFormat',`

also specifies the dimension format `FMT`

)`FMT`

when `dlX`

is
not a formatted `dlarray`

. The output `dlY`

is an
unformatted `dlarray`

with the same dimension order as
`dlX`

, except for any `'S'`

dimensions.

`functionToLayerGraph`

does not support the`gru`

function. If you use`functionToLayerGraph`

with a function that contains the`gru`

operation, the resulting`LayerGraph`

contains placeholder layers.

[1] Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." *arXiv preprint arXiv:1406.1078* (2014).

`dlarray`

| `dlfeval`

| `dlgradient`

| `fullyconnect`

| `lstm`

| `softmax`