Use pre-trained word2vec in lstm language model? - tensorflow

I used tensorflow to train LSTM language model, code is from here.
According to article here, it seems that if I use pre-trained word2vec, it works better.
Using word embeddings such as word2vec and GloVe is a popular method to improve the accuracy of your model. Instead of using one-hot vectors to represent our words, the low-dimensional vectors learned using word2vec or GloVe carry semantic meaning – similar words have similar vectors. Using these vectors is a form of pre-training.
So, I want to use word2vec to redo the training, but I am a little bit confused about how to do this.
The embedding code goes here:
with tf.device("/cpu:0"):
embedding = tf.get_variable(
"embedding", [vocab_size, size], dtype=data_type())
inputs = tf.nn.embedding_lookup(embedding, input_.input_data)
How can I change this code to use pre-trained word2vec?

Related

TF-Hub Elmo uses which word embedding to concatenate with characters in Highway layer

I understand that Elmo uses CNN over characters for character embeddings. However I do not understand how the character embeddings are concatenated with word embeddings in the Highway network. In the Elmo paper most of the evaluations use Glove for word embeddings and CNN character embedding together which make sense as they have mentioned the word embeddings.
But for pre-trained models like the one in TF-Hub with which word embeddings do we concatenate with character embeddings in Highway layer?
Please help me understand if you can.
Concatenation happens inside the https://tfhub.dev/google/elmo/3 model. When using word_emb output, one can get the embedding for each token in the input. The embedding can be used for classification or other modeling tasks similar to BERT/transformer based models. The model also provides direct access to the some hidden state of the LSTM through lstm_outputs1 and lstm_outputs2.

Use FastText trained embeddings in mxnet symbol embedding layer

How do you run fasttext on a corpus and use those embeddings in mxnet symbol embedding layer?
To do that you first need to load the matrix that contains FastText embedding and then pass it as an initializer to the embedding layer:
embed_layer_3 = mx.sym.Embedding(data=input_x_3, weight=the_emb_3, input_dim=vocab_size, output_dim=embedding_dim, name='vocab_embed')
I took this example from here, where they use Glove Embedding, but the idea is the same.
I would highly recommend to use Gluon API instead of Symbol API. In that case it will be much easier for you to use all goodness of GluonNLP package, which already has pretrained FastText embedding. See this tutorial to learn how to use Fasttext in GluonNLP

Tensorflow embeddings

I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.
link 1: Tensorflow | Vector Representations of words
In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the
tf.nn.embedding_lookup() function while training an LSTM network.
link 2: Tensorflow | Embeddings
In this second article however, I couldn't understand what is happening.
word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)
This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.
Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)
Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.
The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.
In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.

Weights update in Tensorflow embedding layer with pretrained fasttext weights

I'm not sure if my understanding is correct but...
While training a seq2seq model, one of the purpose I want to initiated a set of pre-trained fasttext weights in the embedding layers is to decrease the unknown words in the test environment (these unknown words are not in training set). Since pre-trained fasttext model has larger vocabulary, during test environment, the unknown word can be represented by fasttext out-of-vocabulary word vectors, which supposed to have similar direction of the semantic similar words in the training set.
However, due to the fact that the initial fasttext weights in the embedding layers will be updated through the training process (updating weights generates better results). I am wondering if the updated embedding weights would distort the relationship of semantic similarity between words and undermine the representation of fasttext out-of-vocabulary word vectors? (and, between those updated embedding weights and word vectors in the initial embedding layers but their corresponding ID didn't appear in the training data)
If the input ID can be distributed represented vectors extracted from pre-trained model and, then, map these pre-trained word vectors (fixed weights while training) via a lookup table to the embedding layers (these weights will be updated while training), would it be a better solution?
Any suggestions will be appreciated!
You are correct about the problem: when using pre-trained vector and fine-tuning them in your final model, the words that are infrequent or hasn't appear in your training set won't get any updates.
Now, usually one can test how much of the issue for your particular case this is. E.g. if you have a validation set, try fine-tuning and not fine-tuning the weights and see what's the difference in model performance on validation set.
If you see a big difference in performance on validation set when you are not fine-tuning, here is a few ways to handle this:
a) Add a linear transformation layer after not-trainable embeddings. Fine-tuning embeddings in many cases does affine transformations to the space, so one can capture this in a separate layer that can be applied at test time.
E.g. A is pre-trained embedding matrix:
embeds = tf.nn.embedding_lookup(A, tokens)
X = tf.get_variable("X", [embed_size, embed_size])
b = tf.get_vairable("b", [embed_size])
embeds = tf.mul(embeds, X) + b
b) Keep pre-trained embeddings in the not-trainable embedding matrix A. Add trainable embedding matrix B, that has a smaller vocab of popular words in your training set and embedding size. Lookup words both in A and B (and if word is out of vocab use ID=0 for example), concat results and use it input to your model. This way you will teach your model to use mostly A and sometimes rely on B for popular words in your training set.
fixed_embeds = tf.nn.embedding_lookup(A, tokens)
B = tf.get_variable("B", [smaller_vocab_size, embed_size])
oov_tokens = tf.where(tf.less(tokens, smaller_vocab_size), tokens, tf.zeros(tf.shape(tokens), dtype=tokens.dtype))
dyn_embeds = tf.nn.embedding_lookup(B, oov_tokens)
embeds = tf.concat([fixed_embeds, dyn_embeds], 1)

Word Embedding for Convolution Neural Network

I am trying to apply word2vec for convolution neural network. I am new with Tensorflow. Here is my code for pre-train layer.
W = tf.Variable(tf.constant(0.0, shape=[vocabulary_size, embedding_size]),
trainable=False, name="W")
embedding_placeholder = tf.placeholder(tf.float32, [vocabulary_size, embedding_size])
embedding_init = W.assign(embedding_placeholder)
sess = tf.Session()
sess.run(embedding_init, feed_dict={embedding_placeholder: final_embeddings})
I think I should use embedding_lookup but not sure how to use it. I really appreace it someone could give some advice.
Thanks
Tensorflow has an example using word2vec-cnn for text classification: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification_cnn.py
You are on the right track. As embedding_lookup works under the assumption that words are represented as integer ids you need to transform your inputs vectors to comply with that. Furthermore, you need to make sure that your transformed words are correctly indexed into the embedding matrix. What I did was I used the information about the index-to-word-mapping generated from the embedding model (I used gensim for training my embeddings) to create a word-to-index lookup table that I subsequently used to transform my input vectors.
I am doing something similar. I stumbled upon this blog that implements the paper "Convolutional neural networks for Sentence Classification". This blog is good. http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/