I am studying Syntaxnet of Tensoflow.
Howerver, I'm confusing about the way to use --word_embeddings option. Could you show me a example? Thanks a lot.
Not too sure what syntaxnet is but here is how I use word embeddings in my sequence to sequence model.
Here I declare a variable which will store the embedding. It is a matrix with rows equal to the number of words in the vocabulary (NWORDS) and each column is the size of the word vector for that word (WORD_VEC_SIZE). It is randomly initialized and is a trainable parameter of the model.
word_embedding = tf.get_variable('word_embedding', shape = (NWORDS, WORD_VEC_SIZE), initializer = tf.truncated_normal_initializer(0.0, 1,0))
The input to my model is a list of integers which represent the indexs of the words in the embedding. It is of dimension BATCH_SIZE x TIMESTEPS.
source_input = tf.placeholder(tf.int32, (BATCH_SIZE, TIME_STEPS), 'source_input')
Using the embedding and the input I can perform a lookup to convert the integer representations of the words into vectors.
y = tf.nn.embedding_lookup([word_embedding], source_input)
Now y has the shape (BATCH_SIZE, TIME_STEPS, WORD_VEC_SIZE) and can be fed into the model for further processing.
Related
I would like to use the Embedding layer before feeding my input data into the LSTM network I am attempting to create.. Here is the relevant part of the code:
input_step1 = Input(shape=(SEQ_LENGTH_STEP1, NR_FEATURES_STEP1),
name='input_step1')
step1_lstm = CuDNNLSTM(50,
return_sequences=True,
return_state = True,
name="step1_lstm")
out_step1, state_h_step1, state_c_step1 = step1_lstm(input_step1)
I am a bit confused regarding how I am supposed to add an Embedding layer here..
Here is the description of the Embedding layer from the documentation:
keras.layers.Embedding(input_dim,
output_dim,
embeddings_initializer='uniform',
embeddings_regularizer=None,
activity_regularizer=None,
embeddings_constraint=None,
mask_zero=False,
input_length=None)
The confusing part is that my defined Input has a sequence length and number of features defined. Writing it here again:
input_step1 = Input(shape=(SEQ_LENGTH_STEP1, NR_FEATURES_STEP1),
name='input_step1')
When defining an Embedding layer, I am pretty confused about which parameters of the Embedding function corresponds to "number of sequence" and "number of features in each time step". Can anyone guide me how I can integrate an Embedding layer to my code above?
ADDENDUM:
If I try the following:
SEQ_LENGTH_STEP1 = 5
NR_FEATURES_STEP1 = 10
input_step1 = Input(shape=(SEQ_LENGTH_STEP1, NR_FEATURES_STEP1),
name='input_step1')
emb = Embedding(input_dim=NR_FEATURES_STEP1,
output_dim=15,
input_length=NR_FEATURES_STEP1)
input_step1_emb = emb(input_step1)
step1_lstm = CuDNNLSTM(50,
return_sequences=True,
return_state = True,
name="step1_lstm")
out_step1, state_h_step1, state_c_step1 = step1_lstm(input_step1_emb)
I get the following error:
ValueError: Input 0 of layer step1_lstm is incompatible with the layer:
expected ndim=3, found ndim=4. Full shape received: [None, 5, 10, 15]
I am obviously not doing the right thing.. Is there a way to integrate Embedding into the LSTM network I am trying to attempt?
From the Keras Embedding documentation:
Arguments
input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1.
output_dim: int >= 0. Dimension of the dense embedding.
input_length: Length of input sequences, when it is
constant. This argument is required if you are going to connect
Flatten then Dense layers upstream (without it, the shape of the dense
outputs cannot be computed).
Therefore, from your description, I assume that:
input_dim corresponds to the vocabulary size (number of distinct words) of your dataset. For example, the vocabulary size of the following dataset is 5:
data = ["Come back Peter,",
"Come back Paul"]
output_dim is an arbitrary hyperparameter that indicates the dimension of your embedding space. In other words, if you set output_dim=x, each word in the sentence will be characterized with x features.
input_length should be set to SEQ_LENGTH_STEP1 (an integer indicating the length of each sentence), assuming that all the sentences have the same length.
The output shape of an embedding layer is (batch_size, input_length, output_dim).
Further notes regarding the addendum:
team_in_step1 is undefined.
Assuming that your first layer is an Embedding layer, the expected shape of the input tensor input_step1 is (batch_size, input_length):
input_step1 = Input(shape=(SEQ_LENGTH_STEP1,),
name='input_step1')
Each integer in this tensor corresponds to a word.
As mentioned above, the embedding layer could be instantiated as follows:
emb = Embedding(input_dim=VOCAB_SIZE,
output_dim=15,
input_length=SEQ_LENGTH_STEP1)
where VOCAB_SIZE is the size of your vocabulary.
This answer contains a reproducible example that you might find useful.
Same as the title, in tf.keras.layers.Embedding, why it is important to know the size of dictionary as input dimension?
Because internally, the embedding layer is nothing but a matrix of size vocab_size x embedding_size. It is a simple lookup table: row n of that matrix stores the vector for word n.
So, if you have e.g. 1000 distinct words, your embedding layer needs to know this number in order to store 1000 vectors (as a matrix).
Don't confuse the internal storage of a layer with its input or output shape.
The input shape is (batch_size, sequence_length) where each entry is an integer in the range [0, vocab_size[. For each of these integers the layer will return the corresponding row (which is a vector of size embedding_size) of the internal matrix, so that the output shape becomes (batch_size, sequence_length, embedding_size).
In such setting, the dimensions/shapes of the tensors are the following:
The input tensor has size [batch_size, max_time_steps] such that each element of that tensor can have a value in the range 0 to vocab_size-1.
Then, each of the values from the input tensor pass through an embedding layer, that has a shape [vocab_size, embedding_size]. The output of the embedding layer is of shape [batch_size, max_time_steps, embedding_size].
Then, in a typical seq2seq scenario, this 3D tensor is the input of a recurrent neural network.
...
Here's how this is implemented in Tensorflow so you can get a better idea:
inputs = tf.placeholder(shape=(batch_size, max_time_steps), ...)
embeddings = tf.Variable(shape=(vocab_size, embedding_size], ...)
inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)
Now, the output of the embedding lookup table has the [batch_size, max_time_steps, embedding_size] shape.
I am aware that there is a similar topic at LSTM Followed by Mean Pooling, but that is about Keras and I work in pure TensorFlow.
I have an LSTM network where the recurrence is handled by:
outputs, final_state = tf.nn.dynamic_rnn(cell,
embed,
sequence_length=seq_lengths,
initial_state=initial_state)
where I pass the correct sequence lengths for each sample (padding by zeros). In any case, outputs contains irrelevant outputs since some samples produce longer outputs than others, based on sequence lengths.
Right now I'm extracting the last relevant output by means of the following method:
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.reduce_mean(tf.gather_nd(data, indices), axis=0)
where I pass sequence_length - 1 as indices. In reference to the last topic, I would like to select all relevant outputs followed by average pooling, instead of just the last one.
Now, I tried passing nested lists as indeces to extract_axis_1 but tf.stack does not accept this.
Any solution directions for this?
You can exploit the weight parameter of the tf.contrib.seq2seq.sequence_loss function.
From the documentation:
weights: A Tensor of shape [batch_size, sequence_length] and dtype float. weights constitutes the weighting of each prediction in the sequence. When using weights as masking, set all valid timesteps to 1 and all padded timesteps to 0, e.g. a mask returned by tf.sequence_mask.
You need to compute a binary mask that distinguish between your valid outputs and invalid ones. Then you can just provide this mask to the weights parameter of the loss function (probably, you will want to use a loss like this one); the function will not consider the outputs with a 0 weight in the computation of the loss.
If you can't/don't need to use a sequence loss you can do exactly the same thing manually. You compute a binarymask and then multiply your outputs by this mask and provide these as inputs to your fully connected layer.
I am following this tutorial in order to understand CNNs in NLP. There are a few things which I don't understand despite having the code in front of me. I hope somebody can clear a few things up here.
The first rather minor thing is the sequence_lengthparameter of the TextCNN object. In the example on github this is just 56 which I think is the max-length of all sentences in the training data. This means that self.input_x is a 56-dimensional vector which will contain just the indices from the dictionary of a sentence for each word.
This list goes into tf.nn.embedding_lookup(W, self.intput_x) which will return a matrix consisting of the word embeddings of those words given by self.input_x. According to this answer this operation is similar to using indexing with numpy:
matrix = np.random.random([1024, 64])
ids = np.array([0, 5, 17, 33])
print matrix[ids]
But the problem here is that self.input_x most of the time looks like [1 3 44 25 64 0 0 0 0 0 0 0 .. 0 0]. So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?
Another thing I don't get is how tf.nn.embedding_lookup is working here:
# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
I assume, taht self.embedded_chars is the matrix which is the actual input to the CNN where each row represents the word embedding of one word. But how can tf.nn.embedding_lookup know about those indices given by self.input_x?
The last thing which I don't understand here is
W is our embedding matrix that we learn during training. We initialize it using a random uniform distribution. tf.nn.embedding_lookup creates the actual embedding operation. The result of the embedding operation is a 3-dimensional tensor of shape [None, sequence_length, embedding_size].
Does this mean that we are actually learning the word embeddings here? The tutorial states at the beginning:
We will not used pre-trained word2vec vectors for our word embeddings. Instead, we learn embeddings from scratch.
But I don't see a line of code where this is actually happening. The code of the embedding layer does not look like as if there is anything being trained or learned - so where is it happening?
Answer to ques 1 (So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?) :
The 0's in the input vector is the index to 0th symbol in the vocabulary, which is the PAD symbol. I don't think it gets ignored when the lookup is performed. 0th row of the embedding matrix will be returned.
Answer to ques 2 (But how can tf.nn.embedding_lookup know about those indices given by self.input_x?) :
Size of the embedding matrix is [V * E] where is the size of vocabulary and E is dimension of embedding vector. 0th row of matrix is embedding vector for 0th element of vocabulary, 1st row of matrix is embedding vector for 1st element of vocabulary.
From the input vector x, we get the indices of words in vocabulary, which are used for indexing the embedding matrix.
Answer to ques 3 (Does this mean that we are actually learning the word embeddings here?).
Yes, we are actually learning the embedding matrix.
In the embedding layer, in line W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W") W is the embedding matrix and by default, in tensorflow trainable=TRUE for variable. So, W will also be a learned parameter. To use pre- trained model, set trainable = False.
For detailed explanation of the code you can follow blog: https://agarnitin86.github.io/blog/2016/12/23/text-classification-cnn
I understand how word2vec works.
I want to use word2vec(skip-gram) as input for RNN. Input is embedding word vector. Output is also embedding word vector generated by RNN.
Here’s question! How can I convert the output vector to one hot word vector? I need inverse matrix of embeddings but I don’t have!
The output of an RNN is not an embedding. We convert the output from the last layer in an RNN cell into a vector of vocabulary_size by multiplying with an appropriate matrix.
Take a look at the PTB Language Model example to get a better idea. Specifically look at lines 133-136:
softmax_w = tf.get_variable("softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
logits = tf.matmul(output, softmax_w) + softmax_b
The above operation will give you logits. This logits is a probability distribution over your vocabulary. numpy.random.choice might help you to use these logits to make a prediction.