I am trying to implement a Chatbot using Tensorflow and its implementation of seq2seq.
After reading different tutorials (Chatbots with Seq2Seq, Neural Machine Translation (seq2seq) Tutorial, Unsupervised Deep Learning for Vertical Conversational Chatbots), and the original paper Sequence to Sequence Learning with Neural Networks, I could not find an explanation as to why the Tensorflow seq2seq implementation pads all sequences (both input and output) to the same fixed length.
Example:
Input data consists of sequences of integers:
x = [[5, 7, 8], [6, 3], [3], [1]]
RNNs need a different layout. Sequences shorter then the longest one are padded with zeros towards the end. This layout is called time-major.
x is now array([[5, 6, 3, 1],
[7, 3, 0, 0],
[8, 0, 0, 0]])
Why is this padding required?
Source of this tutorial.
If I am missing something, please let me know.
You need to pad the sequence (with some id, in your case 0) to the maximum sequence length. The reason you want to do this is so that sequences can fit in an array (tensor) object and be processed in the same step.
Related
I'm building a program using tensorflow image classification. I got tensorflow from github, and what I know is pretty much, how to run classify_image.py!
What I want to do is have an option to train the model in a simple manner. For example, the model knows "keys", but I want to train it for "HouseKeys" which have a fancy keyfob or something. Is there some sort of script I can use to say "take these 20 images and learn HouseKeys" so the model can distinguish "keys" from "HouseKeys"?
Excuse my nooblines, and thank you in advance!
Edit: Obviously, it is very important that the model keeps its knowledge of all the other categories it knew previously, since being able to recognize only "HouseKeys" is absolutely useless.
You can do this. However, it will need some adjustments probably.
I don't know exactly the script you are referring to but I'm going to assume you have at least two python files. One is the actual neural network and the other one handles training and evaluation.
The first thing you need to do is make sure the neural network can handle new classes. Look for something like this:
input_y = tf.placeholder(tf.float32, [None, classes], name="input_y")
A lot of the time, if you see tensors who's name contains x (input_x for example) they refer to the data, the training input.
Tensors that have y in their name, like the example above, usually refer to the labels.
The code above says input_y is a tensor (think array for the moment) of type float32, of variable length (None from [None, classes]) but with each element of dimension classes.
If classes was 3, input_y could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0]]
Just as well, it could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1]]
Although the length can vary, we must always have as elements a size 3 (classes)
As for the meaning, [0, 0, 1] for example means this is a label for class 2, because we have 1 at index 2 ( look for one hot notation ).
The point of this is, a neural network with this sort of input can learn up to 3 classes. Each input for the tensor x has associated a label from the tensor y and the labels from y can be either 0, 1 or 2 in one hot notation.
With something like this, you can learn for example "keys", "HouseKeys" and "CarKeys" but you will not be able to add "OfficeKeys" for example.
So, first step is make sure your network can learn the maximum number of labels you want.
It does not have to learn them all at once. This brings us to point 2:
Take a look here. This is the documentation for the Tensorflow Saver class. This will allow you to save and load models.
For your problem, this translates into training the model on a 2 class data set, saving it, generating a 3 class data set, loading the previously saved model and train on the new data set. It will have the same "knowledge" (weights) as the model you saved, but it will start to adjust them to fit the third class.
But for this, you will need to make sure the network can, from the begining, handle 3 classes.
Hope this helps!
I'm trying to use the tf.image_summary function in tensorflow. I'm trying to visualize a convolutional layer's filter. The filter is defined as tf.Variable(tf.constant(0.1, shape=[5, 5, 16, 32])).
But here, since I only want to see the final filters, I want to find a way to get a filter of size [5, 5, 32] by just taking the first index of the dimension that was 16. If I use [:, :, 0, :] then I assume I would get a [5, 5, 1, 32] filter instead of the [5, 5, 32] I want.
What should I do?
So tf.image_summary takes in a batch as input however it expects 1,3,or 4 in terms of color channels.
so you'd have to pass in to tf.image_summary something like this:
for i in range(int(math.floor(filter.get_shape()[4]/3))):
tf.image_summary(filter[:,:,:,i:i+3])
When I reading the chapter of "Deep MNIST for expert" in tensorflow tutorial.
There give below function for the weight of first layer. I can't understand why the patch size is 5*5 and why features number is 32, are they the random numbers that you can pick anyone or some rules must be followed? and whether the features number "32" is the “Convolution kernel”?
W_conv1 = weight_variable([5, 5, 1, 32])
First Convolutional Layer
We can now implement our first layer. It will consist of convolution,
followed by max pooling. The convolutional will compute 32 features
for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1,
32]. The first two dimensions are the patch size, the next is the
number of input channels, and the last is the number of output
channels. We will also have a bias vector with a component for each
output channel.
The patch size and the number of features are network hyper-parameters, therefore the are completely arbitrary.
There are rules of thumb, by the way, to follow in order to define a working and performing network.
The kernel size should be small, due to the equivalence between the application of multiple small kernels and lower number of big kernels (it's an image processing topic and it's well explained in the VGG paper). In addiction, operations with small filters are way faster to execute.
The number of features to extract (32 in you example) is completely arbitrary and find the right number is somehow an art.
Yes, both of them are hyperparameters, selected mostly arbitrary for this tutorial. A lot of effort is done currently to find appropriate sizes of the kernel, but for this tutorial it is not important.
The tutorial tells:
The convolutional will compute 32 features for each 5x5 patch. Its weight tensor will have a shape of [5, 5, 1, 32]
tf.nn.conv2d() tells that the second parameter represent your filter and consists of [filter_height, filter_width, in_channels, out_channels]. So [5, 5, 1, 32] means that your in_channels is 1: you have a greyscale image, so no surprises here.
32 means that during our learning phase, the network will try to learn 32 different kernels which will be used during the prediction. You can change this number to any other number as it is a hyperparameter that you can tune.
I want to train an RNN language model using TensorFlow.
My training data is a sequence of 5 tokens represented with integers like so
x = [0, 1, 2, 3, 4]
I want the unrolled length of the RNN to be 4, and the training batch size to be 2. (I chose these values in order to require padding.)
Each token has an embedding of length 3 like so
0 -> [0, 0 ,0]
1 -> [10, 10, 10]
2 -> [20, 20, 20]
3 -> [30, 30, 30]
4 -> [40, 40, 40]
What should I pass as parameters to tf.nn.dynamic_rnn?
This is mostly a repost of "How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?".
That was helpfully answered by Eugene Brevdo. However he slightly misunderstood my question because I didn't have enough TensorFlow knowledge to ask it clearly. (Specifically he thought I meant the batch size to be 1.) Rather than risk additional confusion by editing the original question, I think it is clearest if I just rephrase it here.
I'm trying to figure this out for myself by writing an Example TensorFlow RNN Language Model.
most rnn cells require floating point inputs, so you should first do an embedding lookup on your integer tensor to go from the Categorical values to floating point vectors in your dictionary/embedding. i believe the function is tf.nn.embedding_lookup. the output of that should be a 3-tensor shaped batch x time x embedding_depth (in your case, embedding depth is 3)
you can feed embedding_lookup an integer tensor shaped batch_size x time.
I am trying to write a language model using word embeddings and recursive neural networks in TensorFlow 0.9.0 using the tf.nn.dynamic_rnn graph operation, but I don't understand how the input tensor is structured.
Let's say I have a corpus of n words. I embed each word in a vector of length e, and I want my RNN to unroll to t time steps. Assuming I use the default time_major = False parameter, what shape would my input tensor [batch_size, max_time, input_size] have?
Maybe a specific tiny example will make this question clearer. Say I have a corpus consisting of n=8 words that looks like this.
1, 2, 3, 3, 2, 1, 1, 2
Say I embed it in a vector of size e=3 with the embeddings 1 -> [10, 10, 10], 2 -> [20, 20, 20], and 3 -> [30, 30, 30], what would my input tensor look like?
I've read the TensorFlow Recurrent Neural Network tutorial, but that doesn't use tf.nn.dynamic_rnn. I've also read the documentation for tf.nn.dynamic_rnn, but find it confusing. In particular I'm not sure what "max_time" and "input_size" mean here.
Can anyone give the shape of the input tensor in terms of n, t, and e, and/or an example of what that tensor would look like initialized with data from the small corpus I describe?
TensorFlow 0.9.0, Python 3.5.1, OS X 10.11.5
In your case, it looks like batch_size = 1, since you're looking at a single example. So max_time is n=8 and input_size is the input depth, in your case e=3. So you would want to construct an input tensor which is shaped [1, 8, 3]. It's batch_major, so the first dimension (the batch dimension) is 1. If, say, you had another input at the same time, with n=6 words, then you would combine the two by padding this second example to 8 words (by padding zeros for the last 2 word embeddings) and you would have an inputs size of [2, 8, 3].