Import pre-trained word embeddings (GloVe, Skipgram, etc.) in Deep Neural Network models.

5 views (last 30 days)
I was going through this page to learn how to classify text using word embeddings and LSTM. The page talks about training the word embeddings within the LSTM architecture, but does not discuss if I want to import word embedding models trained externally such as those using Global Vectors and word2vec which already provide large-scale pre-trained word embeddings. Any ideas how I can use pre-trained word embeddings in the LSTM architecture?

Accepted Answer

Liliana Agapito de Sousa Medina
You can use a pre-trained embedding model to initialize the Weights property of the wordEmbeddingLayer. For example:
% Import your pretrained word embedding model of choice
emb = readWordEmbedding('existingEmbeddingModel.vec');
embDim = emb.Dimension;
numWords = numel(emb.Vocabulary);
% Initialize the word embedding layer
embLayer = wordEmbeddingLayer(embDim, numWords);
embLayer.Weights = word2vec(emb, emb.Vocabulary)';
% If you want to keep the original weights "frozen", uncomment the following line
% embLayer.WeightLearnRateFactor = 0
The wordEmbeddingLayer with initialized Weights can then be placed in the network before lstmLayer.
Also note that training documents should be mapped according to the vocabulary of the pre-trained embedding model, before passing to the net for training, for example:
enc = wordEncoding(tokenizedDocument(emb.Vocabulary,'TokenizeMethod','none'));
XTrain = doc2sequence(enc,documentsTrain,'Length',75);

More Answers (2)

CoderTargaryn
CoderTargaryn on 28 Nov 2018
Hi, Many thanks for your answer. After posting my question, I did some MATLAB documentation reading online and found that it is possible using your suggested way.

koosha salehi
koosha salehi on 24 Oct 2020
HI
  • I am using stanford glove data set and i want to design a deep network with lstm i use WordEmbeddingLayer but it doesn't work i think that sequence input layer makes problem. who can help me?
  • and i need a small labled corpus and its Equivalent vectors for Glove format.
any one do it before?

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!