How exactly is tf.contrib.rnn.AttentionCellWrapper used? Can someone give a piece of example code?
Specifically, I only managed to make the following
fwd_cell = tf.contrib.rnn.AttentionCellWrapper(tf.contrib.rnn.BasicLSTMCell(hidden_size),10,50,state_is_tuple=True)
bwd_cell = tf.contrib.rnn.AttentionCellWrapper(tf.contrib.rnn.BasicLSTMCell(hidden_size),10,50,state_is_tuple=True)
but in Bahdanau et al. 2015, the attention operates on the entire bidirectional RNN. I have no idea how to code that in Tensorflow.
Related
Why we have to init weight in model predict? I can't understand.
You can refer : https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#checkpoint_the_initial_weights
initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights')
model.save_weights(initial_weights)
This tutorial appears to be referring to unbalanced data. You do not need to provide initial weights if you don't want to in Tensorflow's predict command. See this link describing potential inputs to the command.
Deep learning using Gradient Descent and its variant to find optimal weights. If you don't init weights, it may take a long time to converge or even can't converge.
I have been trying to train a breast cancer segmentation model with mask rcnn. I have been able to understand almost all the hyperparameter but this one variable TRAIN_ROI_PER_IMAGE I just can't seem to wrap my head around it and there's little to no documentation available for it.
If anyone could please explain it to me, it would be super helpful for my research.
TRAIN_ROI_PER_IMAGE - means how many Region of Interest or ROI proposals will be fed to the mask head or the classifier.
img_src : Ren et al., 2016
Concretely, This setting is like the batch size for the second stage of the model.
I am trying to code a simple Neural machine translation using tensorflow. But I am a little stuck regarding the understanding of the embedding on tensorflow :
I do not understand the difference between tf.contrib.layers.embed_sequence(inputs, vocab_size=target_vocab_size,embed_dim=decoding_embedding_size)
and
dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
In which case should I use one to another ?
The second thing I do not understand is about tf.contrib.seq2seq.TrainingHelper and tf.contrib.seq2seq.GreedyEmbeddingHelper. I know that in the case of translation, we use mainly TrainingHelper for the training step (use the previous target to predict the next target) and GreedyEmbeddingHelper for the inference step (use the previous timestep to predict the next target).
But I do not understand how does it work. In particular the different parameters used. For example why do we need a sequence length in the case of TrainingHelper (why do we not used an EOS)? Why both of them do not use the embedding_lookup or embedding_sequence as input ?
I suppose that you're coming from this seq2seq tutorial. Even though this question is starting to get old, I'll try to answer for the people passing by like me:
For the first question, I looked at the source code behind tf.contrib.layers.embed_sequence, and it is actually using tf.nn.embedding_lookup. So it just wraps it, and creates the embedding matrix (tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))) for you. Although this is convenient and less verbose, by using embed_sequence there doesn't seem to a direct way to access the embeddings. So if you want to, you have to query for the internal variable used as the embedding matrix by using the same name space. I have to admit that the code in the tutorial above is confusing. I even suspect he's using different embeddings in the encoder and the decoder.
For the second question:
I guess it is equivalent to use a sequence length or an embedding.
The TrainingHelper doesn't need the embedding_lookup as it only forwards the inputs to the decoder, GreedyEmbeddingHelper does take as a first input the embedding_lookup as mentioned in the documentation.
If I understand you correctly, the first question is about the differences between tf.contrib.layers.embed_sequence and tf.nn.embedding_lookup.
According to the official docs (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence),
Typical use case would be reusing embeddings between an encoder and decoder.
I think tf.contrib.layers.embed_sequence is designed for seq2seq models.
I found the following post:
https://github.com/tensorflow/tensorflow/issues/17417
where #ispirmustafa mentioned:
embedding_lookup doesn't support invalid ids.
Also, in another post: tf.contrib.layers.embed_sequence() is for what?
#user1930402 said:
When building a neural network model that has multiple gates that take features as input, by using tensorflow.contrib.layers.embed_sequence, you can reduce the number of parameters in your network while preserving depth. For example, it eliminates the need for each gates of the LSTM to perform its own linear projection of features.
It allows for arbitrary input shapes, which helps the implementation be simple and flexible.
For the second question, sorry that I didn't use TrainingHelper and can't answer your question.
I want to implement the sentence-level log-likelihood as described in
Collobert et al., p. 14.
To compute transition scores, I could use CRF, but I don't know how to integrate it in tensorflow. I thought about using
tf.contrib.crf.CrfForwardRnnCell to compute transition scores, but this class returns a pair of [batch_size, num_tags] matrices values containing the new alpha values and not as I would expect one [batch_size, num_tags, num_tags] tensor.
Does anyone has an example how to use CRF in tensorflow? Thank you!
A good example of using contrib.crf in TensorFlow is given here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/crf
It's worth noting that the SLL objective described in the paper Collobert et al. 2011 is slightly different than the CRF objective in that SLL lacks normalization (see Remark 4 on p. 16), but this shouldn't really matter in practice (I'd just use the CRF.)
I'd like to build a conversational modal that can predict a sentence using the previous sentences using TensorFlow LSTMs . The example provided in TensorFlow tutorial can be used to predict the next word in a sentence .
https://www.tensorflow.org/versions/v0.6.0/tutorials/recurrent/index.html
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])
loss = 0.0
for current_batch_of_words in words_in_dataset:
# The value of state is updated after processing each batch of words.
output, state = lstm(current_batch_of_words, state)
# The LSTM output can be used to make next word predictions
logits = tf.matmul(output, softmax_w) + softmax_b
probabilities = tf.nn.softmax(logits)
loss += loss_function(probabilities, target_words)
Can I use the same technique to predict the next sentence ? Is there any working example on how to do this ?
You want to use the Sequence-to-sequence model. Instead of having it learn to translate sentences from a source language to a target language you have it learn responses to previous utterances in the conversation.
You can adapt the example seq2seq model in tensorflow by using the analogy that the source language 'English' is your set of previous sentences and target language 'French' are your response sentences.
In theory you could use the basic LSTM you were looking at by concatenating your training examples with a special symbol like this:
hello there ! __RESPONSE hi , how can i help ?
Then during testing you run it forward with a sequence up to and including the __RESPONSE symbol and the LSTM can carry it the rest of the way.
However, the seq2seq model above should be much more accurate and powerful because it had a separate encoder / decoder and includes an attention mechanism.
A sentence is composed words, so you can indeed predict the next sentence by predicting words sequentially. There are models, such as the one described in this paper, that build embeddings for entire paragraphs, which can be useful for your purpose. Of course there is Neural Conversational Model work that probably directly fits your need. TensorFlow doesn't ship with working examples of these models, but the recurrent models that come with TensorFlow should give you a good starting point for implementing them.