This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

wordEmbeddingLayer

Word embedding layer for deep learning networks

Description

A word embedding layer maps word indices to vectors.

Use a word embedding layer in a deep learning long short-term memory (LSTM) network. An LSTM network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. A word embedding layer maps a sequence of word indices to embedding vectors and learns the word embedding during training.

This layer requires Deep Learning Toolbox™.

Creation

Syntax

layer = wordEmbeddingLayer(dimension,numWords)
layer = wordEmbeddingLayer(dimension,numWords,Name,Value)

Description

example

layer = wordEmbeddingLayer(dimension,numWords) creates a word embedding layer and specifies the embedding dimension and vocabulary size.

layer = wordEmbeddingLayer(dimension,numWords,Name,Value) sets optional properties using one or more name-value pairs. Enclose each property name in single quotes.

Properties

expand all

Word Embedding

Dimension of the word embedding, specified as a positive integer.

Example: 300

Number of words in the model, specified as a positive integer. If the number of unique words in the training data is greater than NumWords, then the layer maps the out-of-vocabulary words to the same vector.

Parameters and Initialization

Function to initialize the weights, specified as one of the following:

  • 'narrow-normal' – Initialize the weights by independently sampling from a normal distribution with zero mean and standard deviation 0.01.

  • 'glorot' – Initialize the weights with the Glorot initializer [1] (also known as Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and variance 2/(numIn + numOut), where numIn = NumWords + 1 and numOut = Dimension.

  • 'he' – Initialize the weights with the He initializer [2]. The He initializer samples from a normal distribution with zero mean and variance 2/numIn, where numIn = NumWords + 1.

  • 'orthogonal' – Initialize the input weights with Q, the orthogonal matrix given by the QR decomposition of Z = QR for a random matrix Z sampled from a unit normal distribution. [3]

  • 'zeros' – Initialize the weights with zeros.

  • 'ones' – Initialize the weights with ones.

  • Function handle – Initialize the weights with a custom function. If you specify a function handle, then the function must be of the form weights = func(sz), where sz is the size of the weights.

The layer only initializes the weights when the Weights property is empty.

Data Types: char | string | function_handle

Layer weights, specified as a Dimension-by-(NumWords+1) array.

For input integers i less than or equal to NumWords, the layer outputs the vector Weights(:,i). Otherwise, the layer maps outputs the vector Weights(:,NumWords+1).

Learn Rate and Regularization

Learning rate factor for the weights, specified as a nonnegative scalar.

The software multiplies this factor by the global learning rate to determine the learning rate for the weights in this layer. For example, if WeightLearnRateFactor is 2, then the learning rate for the weights in this layer is twice the current global learning rate. The software determines the global learning rate based on the settings specified with the trainingOptions function.

Example: 2

L2 regularization factor for the weights, specified as a nonnegative scalar.

The software multiplies this factor by the global L2 regularization factor to determine the L2 regularization for the weights in this layer. For example, if WeightL2Factor is 2, then the L2 regularization for the weights in this layer is twice the global L2 regularization factor. You can specify the global L2 regularization factor using the trainingOptions function.

Example: 2

Layer

Layer name, specified as a character vector or a string scalar. If Name is set to '', then the software automatically assigns a name at training time.

Data Types: char | string

Number of inputs of the layer. This layer accepts a single input only.

Data Types: double

Input names of the layer. This layer accepts a single input only.

Data Types: cell

Number of outputs of the layer. This layer has a single output only.

Data Types: double

Output names of the layer. This layer has a single output only.

Data Types: cell

Examples

collapse all

Create a word embedding layer with embedding dimension 300 and 5000 words.

layer = wordEmbeddingLayer(300,5000)
layer = 
  WordEmbeddingLayer with properties:

         Name: ''

   Hyperparameters
    Dimension: 300
     NumWords: 5000

   Learnable Parameters
      Weights: []

  Show all properties

Include a word embedding layer in an LSTM network.

inputSize = 1;
embeddingDimension = 300;
numWords = 5000;
numHiddenUnits = 200;
numClasses = 10;

layers = [
    sequenceInputLayer(inputSize)
    wordEmbeddingLayer(embeddingDimension,numWords)
    lstmLayer(numHiddenUnits,'OutputMode','last')
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer]
layers = 
  6x1 Layer array with layers:

     1   ''   Sequence Input          Sequence input with 1 dimensions
     2   ''   Word Embedding Layer    Word embedding layer with 300 dimensions and 5000 unique words
     3   ''   LSTM                    LSTM with 200 hidden units
     4   ''   Fully Connected         10 fully connected layer
     5   ''   Softmax                 softmax
     6   ''   Classification Output   crossentropyex

References

[1] Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249-256. 2010.

[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." In Proceedings of the IEEE international conference on computer vision, pp. 1026-1034. 2015.

[3] Saxe, Andrew M., James L. McClelland, and Surya Ganguli. "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks." arXiv preprint arXiv:1312.6120 (2013).

Introduced in R2018b