What are the parameters of TensorFlow's dynamic_rnn for this simple data set? - tensorflow

I want to train an RNN language model using TensorFlow.
My training data is a sequence of 5 tokens represented with integers like so
x = [0, 1, 2, 3, 4]
I want the unrolled length of the RNN to be 4, and the training batch size to be 2. (I chose these values in order to require padding.)
Each token has an embedding of length 3 like so
0 -> [0, 0 ,0]
1 -> [10, 10, 10]
2 -> [20, 20, 20]
3 -> [30, 30, 30]
4 -> [40, 40, 40]
What should I pass as parameters to tf.nn.dynamic_rnn?
This is mostly a repost of "How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?".
That was helpfully answered by Eugene Brevdo. However he slightly misunderstood my question because I didn't have enough TensorFlow knowledge to ask it clearly. (Specifically he thought I meant the batch size to be 1.) Rather than risk additional confusion by editing the original question, I think it is clearest if I just rephrase it here.
I'm trying to figure this out for myself by writing an Example TensorFlow RNN Language Model.

most rnn cells require floating point inputs, so you should first do an embedding lookup on your integer tensor to go from the Categorical values to floating point vectors in your dictionary/embedding. i believe the function is tf.nn.embedding_lookup. the output of that should be a 3-tensor shaped batch x time x embedding_depth (in your case, embedding depth is 3)
you can feed embedding_lookup an integer tensor shaped batch_size x time.

Related

Tensorflow custom Estimator with Dataset API: embedding lookup (feature_column) NMT task

my question is close in nature to Feature Columns Embedding lookup, however I was unable to comment on the answer given there (not enough rep), and I think the answerer either did not fully understand the question, or the answer was not exactly what was asked.
GOAL
To serve a custom Estimator which uses Dataset API to feed in data. The task is NMT (seq2seq).
Issue
Estimator requires feature_columns as input for serving. My NMT task has only one feature, the input sentence to translate (or possibly each word in the sentence is a feature?). And so I am unsure how to build a feature_column (and thus an embedding_column and finally an input_layer) using my input sentences as a feature that can be fed into an RNN (which expects an embedding_lookup [batch_size, max_seqence_len, embedding_dim]) which will finally allow me to serve the Estimator.
Background
I am attempting to utilize a custom estimator to feed a seq2seq style NMT implementation. I need to be able to serve the model via tf-serving, which estimators seem to make relatively easy.
However I hit a road block with 'how' to serve the model. From what I can tell I need 'feature_columns' that will serve as the input into the model.
https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_basic_serving.md
Shows that you need to have an export_input_fn which uses a feature_spec which needs a feature_column(s) as input. This makes sense, however, for my use case I do not have a bunch of (different) features, instead I have input sentences (where each word is a feature) that need to be looked up via embeddings and used as features...
So I know I need the input into my model to be feature column(s). My input for NMT is simply a tensor of [batch_size, max_sequence_len] which is filled with the indices of the words from the sentences (e.g., for batch_size=1 [3, 17, 132, 2, 1, 0, ...] where each index should map to an embedding vector). Typically I would feed this into a embedding_lookup via
embs = tf.get_variable('embedding', [vocab_size, embedding_dim])
tf.nn.embedding_lookup(embs, inputs)
and I would be good to go, I could feed this to an RNN as inputs and the rest is history, not a problem.
BUT, this is where I hit the issue, I need to use feature_columns (so I can serve the model). The answer given to the question I mentioned at the beginning of this shows how to use embedding_column, but instead he is suggesting that embedding should look up an entire sentence as one single feature, but traditionally you would look up each word in the sentence and get its embedding.
Similarly, https://www.damienpontifex.com/2018/01/02/using-tensorflow-feature-columns-in-your-custom-estimator-model/
Shows 'how to implement a feature-column in a custom estimator' and indeed his 'Before' code is exactly right (as I wrote out), a tf.get_variable into a tf.nn.embedding_lookup, but his 'after' code, again, only takes in 1 feature (the entire sentence?).
I have verified this by using their code and feeding my data in [batch_size, max_seq_len] to the tf.feature_column.categorical_column_with_identity, and the output tensor is [batch_size, embedding_dim]
the sequence information is lost? Or does it simply get flattened? when I print the output its size (?, embedding_dim) where ? is typically my batch_size.
EDIT: I have verified the shape is [batch_size, embedding_dim], it is not just flattened... So the sequence info is lost
I'm guessing it must be treating the input as 1 single input feature (thus the batch_size=1 ex [3, 17, 132, 2, 1, 0, ...] where each index maps to an embedding vector) would map to a single feature which is not what is wanted, we want each index to map to an embedding and the needed output is [batch_size, max_seq_len, embedding_dim].
It sounds like what I instead need, is not one categorical_column_with_*, but a max_seq_len amount of them (1 for each word in my sequence), does this sound right? Each word would be a feature for my model so I am leaning toward this being the correct approach, but this also has issues. I am using the Dataset API, so in my input_train_fn() I load my data from a file, and then use tf.data.Dataset.from_tensor_slices(data, labels) to split the data into tensors which I can then dataset.batch(batch_size).make_one_shot_iterator().get_next() to feed into my Estimator. I cannot iterator over each batch (Tesors are not iterable) so I cannot simply make 100 feature_columns for each input batch...
Does anyone have any idea how to do this? This embedding lookup is a very straightforward thing to do with simple placeholders or variables (and a common approach in NLP tasks). But when I venture into Dataset API and Estimators I run into a wall with very little in the way of information (that is not a basic example).
I admit I may have gaps in my understanding, custom estimators and dataset API are new to me and finding information on them can be difficult at times. So feel free to fire off information at me.
Thanks for reading my wall of text and hopefully helping me (and the others I've seen ask a question similar to this but get no answer https://groups.google.com/a/tensorflow.org/forum/#!searchin/discuss/embeddings$20in$20custom$20estimator/discuss/U3vFQF_jeaY/EjgwRQ3RDQAJ I feel bad for this guy, his question was not really answered (for the same reason outlined here, and his thread got hijacked...).
If I understand correctly, You want to use Estimator API to build a SeqSeq Model. A good place to start here, Look into the Problems-Solutions/text folder.
To answer your question on how to use emedding lookup, here is a one example
vocab_table = lookup.index_table_from_file(vocabulary_file='data/vocab.csv', num_oov_buckets=1, default_value=-1)
text = features[commons.FEATURE_COL]
words = tf.string_split(text)
dense_words = tf.sparse_tensor_to_dense(words, default_value=commons.PAD_WORD)
word_ids = vocab_table.lookup(dense_words)
padding = tf.constant([[0, 0], [0, commons.MAX_DOCUMENT_LENGTH]])
# Pad all the word_ids entries to the maximum document length
word_ids_padded = tf.pad(word_ids, padding)
word_id_vector = tf.slice(word_ids_padded, [0, 0], [-1, commons.MAX_DOCUMENT_LENGTH])
word_id_vector = {commons.FEATURE_COL: word_id_vector}
bow_column = tf.feature_column.categorical_column_with_identity(commons.FEATURE_COL, num_buckets=params.N_WORDS)
bow_embedding_column = tf.feature_column.embedding_column(bow_column, dimension=50, combiner='sqrtn')
bow = tf.feature_column.input_layer(word_id_vector, feature_columns=[bow_embedding_column])
logits = tf.layers.dense(bow, 2, activation=None)
The above code can be wrapped in Estimator model_fn. The above repo contains this code. Please take a look at it.
So the way I ended up making this work is I made each word an input feature, then I simply do the wrd_2_idx conversion, pass that in as a feature in a numerical_column(s, you have max_seq_lens of these) and then pass those columns to input_layer. Then in my graph I uses these features and lookup the embedding as normal. Basically circumventing the embedding_column lookup since I can't figure out how to make it act the way I want. This is probably not optimal but it works and trains...
I'll leave this as the accepted answer and hope sometime in the future either I figure out a better way to do it, or someone else can enlighten me to the best way to approach this.
I managed to get this working ... also got derailed by the fact that the RNN did not consume an embedding.
What I did to get this working (in the simplest case):
#features[VALUE_FEATURE_NAME] is shape (?, 200), ie. 200 words per row
inputs = tf.contrib.layers.embed_sequence(
features[VALUES_FEATURE_NAME], 3000, 5,
)
# create an LSTM cell of size 100
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(200)
# create the complete LSTM
_, final_states = tf.nn.dynamic_rnn(
lstm_cell, inputs, dtype=tf.float32)
outputs = final_states.h
So I guess the answer lies in the tensorflow docs for dynamic rnn
Creates a recurrent neural network specified by RNNCell cell.
Performs fully dynamic unrolling of inputs.
So the unrolling here means that the RNN consumes [batch,time_steps,values] as an input.
Bests
You can use tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list and tf.contrib.feature_column.sequence_input_layer to solve it.
The demo code is as follows:
import tensorflow as tf
if __name__ == "__main__":
#tf.enable_eager_execution()
feature = {
'aa': [['1', '2', '3'],
['-1', '4', '-1'],
['2', '-1', '-1'],
['4', '5', '6']]
}
aa_id = tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list(
'aa', ['1', '2', '3', '4', '5']
)
seq_emb_matrix = tf.feature_column.embedding_column(aa_id, 2)
seq_tensor, seq_length = tf.contrib.feature_column.sequence_input_layer(feature, [seq_emb_matrix])
seq_tensor1, seq_length1 = tf.contrib.feature_column.sequence_input_layer(feature1, [seq_emb_matrix])
seq_tensor2 = tf.squeeze(seq_tensor1)
# print(tensor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
a, a_len = sess.run([seq_tensor, seq_length])
b, b_len = sess.run([seq_tensor1, seq_length1])
print(a)
print('a_len', a_len)
print(a.shape)
print('-'*50)
print(b)
print('b_len', b_len)
print(b.shape)
print(sess.run([seq_tensor2]))
print results are as follows:
[[[ 0.5333682 -0.39895234]
[ 0.5335079 0.64998794]
[-1.0432893 -0.8391434 ]]
[[ 0. 0. ]
[-0.29623085 -0.17570129]
[ 0. 0. ]]
[[ 0.5335079 0.64998794]
[ 0. 0. ]
[ 0. 0. ]]
[[-0.29623085 -0.17570129]
[ 0.7100604 0.9935588 ]
[ 0. 0. ]]]
('a_len', array([3, 3, 3, 3]))
(4, 3, 2)
--------------------------------------------------
[[[-0.24147142 -0.37740025]]
[[-0.6222648 1.3862932 ]]
[[ 1.2226609 -0.68292266]]]
('b_len', array([1, 1, 1]))
(3, 1, 2)
[array([[-0.24147142, -0.37740025],
[-0.6222648 , 1.3862932 ],
[ 1.2226609 , -0.68292266]], dtype=float32)]

Why does Tensoflow seq2seq pad all sequences to the same fixed length?

I am trying to implement a Chatbot using Tensorflow and its implementation of seq2seq.
After reading different tutorials (Chatbots with Seq2Seq, Neural Machine Translation (seq2seq) Tutorial, Unsupervised Deep Learning for Vertical Conversational Chatbots), and the original paper Sequence to Sequence Learning with Neural Networks, I could not find an explanation as to why the Tensorflow seq2seq implementation pads all sequences (both input and output) to the same fixed length.
Example:
Input data consists of sequences of integers:
x = [[5, 7, 8], [6, 3], [3], [1]]
RNNs need a different layout. Sequences shorter then the longest one are padded with zeros towards the end. This layout is called time-major.
x is now array([[5, 6, 3, 1],
[7, 3, 0, 0],
[8, 0, 0, 0]])
Why is this padding required?
Source of this tutorial.
If I am missing something, please let me know.
You need to pad the sequence (with some id, in your case 0) to the maximum sequence length. The reason you want to do this is so that sequences can fit in an array (tensor) object and be processed in the same step.

Retrain tensorflow model for additional categories

I'm building a program using tensorflow image classification. I got tensorflow from github, and what I know is pretty much, how to run classify_image.py!
What I want to do is have an option to train the model in a simple manner. For example, the model knows "keys", but I want to train it for "HouseKeys" which have a fancy keyfob or something. Is there some sort of script I can use to say "take these 20 images and learn HouseKeys" so the model can distinguish "keys" from "HouseKeys"?
Excuse my nooblines, and thank you in advance!
Edit: Obviously, it is very important that the model keeps its knowledge of all the other categories it knew previously, since being able to recognize only "HouseKeys" is absolutely useless.
You can do this. However, it will need some adjustments probably.
I don't know exactly the script you are referring to but I'm going to assume you have at least two python files. One is the actual neural network and the other one handles training and evaluation.
The first thing you need to do is make sure the neural network can handle new classes. Look for something like this:
input_y = tf.placeholder(tf.float32, [None, classes], name="input_y")
A lot of the time, if you see tensors who's name contains x (input_x for example) they refer to the data, the training input.
Tensors that have y in their name, like the example above, usually refer to the labels.
The code above says input_y is a tensor (think array for the moment) of type float32, of variable length (None from [None, classes]) but with each element of dimension classes.
If classes was 3, input_y could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0]]
Just as well, it could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1]]
Although the length can vary, we must always have as elements a size 3 (classes)
As for the meaning, [0, 0, 1] for example means this is a label for class 2, because we have 1 at index 2 ( look for one hot notation ).
The point of this is, a neural network with this sort of input can learn up to 3 classes. Each input for the tensor x has associated a label from the tensor y and the labels from y can be either 0, 1 or 2 in one hot notation.
With something like this, you can learn for example "keys", "HouseKeys" and "CarKeys" but you will not be able to add "OfficeKeys" for example.
So, first step is make sure your network can learn the maximum number of labels you want.
It does not have to learn them all at once. This brings us to point 2:
Take a look here. This is the documentation for the Tensorflow Saver class. This will allow you to save and load models.
For your problem, this translates into training the model on a 2 class data set, saving it, generating a 3 class data set, loading the previously saved model and train on the new data set. It will have the same "knowledge" (weights) as the model you saved, but it will start to adjust them to fit the third class.
But for this, you will need to make sure the network can, from the begining, handle 3 classes.
Hope this helps!

How to understand the convolution parameters in tensorflow?

When I reading the chapter of "Deep MNIST for expert" in tensorflow tutorial.
There give below function for the weight of first layer. I can't understand why the patch size is 5*5 and why features number is 32, are they the random numbers that you can pick anyone or some rules must be followed? and whether the features number "32" is the “Convolution kernel”?
W_conv1 = weight_variable([5, 5, 1, 32])
First Convolutional Layer
We can now implement our first layer. It will consist of convolution,
followed by max pooling. The convolutional will compute 32 features
for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1,
32]. The first two dimensions are the patch size, the next is the
number of input channels, and the last is the number of output
channels. We will also have a bias vector with a component for each
output channel.
The patch size and the number of features are network hyper-parameters, therefore the are completely arbitrary.
There are rules of thumb, by the way, to follow in order to define a working and performing network.
The kernel size should be small, due to the equivalence between the application of multiple small kernels and lower number of big kernels (it's an image processing topic and it's well explained in the VGG paper). In addiction, operations with small filters are way faster to execute.
The number of features to extract (32 in you example) is completely arbitrary and find the right number is somehow an art.
Yes, both of them are hyperparameters, selected mostly arbitrary for this tutorial. A lot of effort is done currently to find appropriate sizes of the kernel, but for this tutorial it is not important.
The tutorial tells:
The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]
tf.nn.conv2d() tells that the second parameter represent your filter and consists of [filter_height, filter_width, in_channels, out_channels]. So [5, 5, 1, 32] means that your in_channels is 1: you have a greyscale image, so no surprises here.
32 means that during our learning phase, the network will try to learn 32 different kernels which will be used during the prediction. You can change this number to any other number as it is a hyperparameter that you can tune.

How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?

I am trying to write a language model using word embeddings and recursive neural networks in TensorFlow 0.9.0 using the tf.nn.dynamic_rnn graph operation, but I don't understand how the input tensor is structured.
Let's say I have a corpus of n words. I embed each word in a vector of length e, and I want my RNN to unroll to t time steps. Assuming I use the default time_major = False parameter, what shape would my input tensor [batch_size, max_time, input_size] have?
Maybe a specific tiny example will make this question clearer. Say I have a corpus consisting of n=8 words that looks like this.
1, 2, 3, 3, 2, 1, 1, 2
Say I embed it in a vector of size e=3 with the embeddings 1 -> [10, 10, 10], 2 -> [20, 20, 20], and 3 -> [30, 30, 30], what would my input tensor look like?
I've read the TensorFlow Recurrent Neural Network tutorial, but that doesn't use tf.nn.dynamic_rnn. I've also read the documentation for tf.nn.dynamic_rnn, but find it confusing. In particular I'm not sure what "max_time" and "input_size" mean here.
Can anyone give the shape of the input tensor in terms of n, t, and e, and/or an example of what that tensor would look like initialized with data from the small corpus I describe?
TensorFlow 0.9.0, Python 3.5.1, OS X 10.11.5
In your case, it looks like batch_size = 1, since you're looking at a single example. So max_time is n=8 and input_size is the input depth, in your case e=3. So you would want to construct an input tensor which is shaped [1, 8, 3]. It's batch_major, so the first dimension (the batch dimension) is 1. If, say, you had another input at the same time, with n=6 words, then you would combine the two by padding this second example to 8 words (by padding zeros for the last 2 word embeddings) and you would have an inputs size of [2, 8, 3].