I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN
As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with two dimensional array in TF-RNN ptb_word_lm.py, when it did not make sense any more.
what I though tf.nn.embedding_lookup does:
Given a 2-d array params, and a 1-d array ids, function tf.nn.embedding_lookup fetches rows from params, corresponding to the indices given in ids, which holds with the dimension of output it is returning.
What I am confused about:
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
I looked up the manual for Embedding Lookup, but I still find it difficult to understand how the partitioning works, and the result that is returned. I recently tried some simple example with tf.nn.embedding_lookup and it appears that it returns different values each time. Is this behaviour due to the randomness involved in partitioning ?
Please help me understand how tf.nn.embedding_lookup works, and why is used in both word2vec_basic.py, and ptb_word_lm.py i.e., what is the purpose of even using them ?
There is already an answer on what does tf.nn.embedding_lookup here.
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
When you had a 1-D list of ids [0, 1], the function would return a list of embeddings [embedding_0, embedding_1] where embedding_0 is an array of shape embedding_size. For instance the list of ids could be a batch of words.
Now, you have a matrix of ids, or a list of list of ids. For instance, you now have a batch of sentences, i.e. a batch of list of words, i.e. a list of list of words.
If your list of sentences is: [[0, 1], [0, 3]] (sentence 1 is [0, 1], sentence 2 is [0, 3]), the function will compute a matrix of embeddings, which will be of shape [2, 2, embedding_size]and will look like:
[[embedding_0, embedding_1],
[embedding_0, embedding_3]]
Concerning the partition_strategy argument, you don't have to bother about it. Basically, it allows you to give a list of embedding matrices as params instead of 1 matrix, if you have limitations in computation.
So, you could split your embedding matrix of shape [1000, embedding_size] in ten matrices of shape [100, embedding_size] and pass this list of Variables as params. The argument partition_strategy handles the distribution of the vocabulary (the 1000 words) among the 10 matrices.
Related
I have a tf.data.Dataset object d, where each element is an integer tf.sparse.SparseTensor, and I would like to sum them, returning a sparse tensor. One way I see is the following:
d.reduce(tf.sparse.SparseTensor(tf.zeros([0, 1], tf.int64),
tf.zeros([0], tf.int32),
dense_shape),
tf.sparse.add)
Problem:
How do I construct the zero for the reduce operation if I do not know the dense_shape ahead of time? I know all the sparse tensors in the dataset will have same shape, but it is not statically known. Perhaps it depends on data from which the sparse tensors are constructed, and setting this interactively in eager mode is not a viable option.
While taking an input for a word prediction model in rnn tensorflow,
why do we need a 3d tensor?
Please look at the code below.
Why do we need that extra 1 here?
x = tf.placeholder("float", [None, n_input, 1])
Emm, I guess you are using simple predict model, maybe just for demonstrate. We tried to use RNN to predict model with an basic idea that each word in a sentence will be affected by the former or the latter words(that's what we called context), so we use the sequential inputs of words in a sentence to represent words appear one by one with the time passes.
So we need a Tensor with a shape [batch_size, words_counts, words_represent], and in your situation, words_counts is n_input represents time steps, and the word represent tensor words_represent is shape (1, ) tuple.
BUT, in real practice, we not just transfer each word into an (1, ) tuple, we may use word embedding to create a meaningful and useful tensor represent of a word. So maybe i guess that you have tried a simple demo or may i making mistakes.
I've tried to compare the tutorial code for text classification from tflearn : https://github.com/tflearn/tflearn/blob/master/examples/nlp/cnn_sentence_classification.py
And the one from dennybritz :
https://github.com/dennybritz/cnn-text-classification-tf
These 2 codes shows different result, i understand that it can be because the tflearn tutorial use 1d convolution, but there is one line of code that i don't understand:
network = global_max_pool(network)
What is the difference between global_max_pool and max_pool_2d?
Looking at the code, they make different calls to the tensor flow library:
2d_max_pool
Does broadly what you would expect and returns (as well as doing some other things):
tf.nn.max_pool(incoming, kernel, strides, padding)
With the specified arguments. This is a 4d tensor, similar to the input one
global_max_pool
Actually performs a pretty drastic reduction in the input tensor. The input tensor is of dimension:
[batch, height, width, in_channels]
The function global_max_pool then returns (as well as doing some other things)
tf.reduce_max(incoming, [1, 2])
I think gives the maximum value of each tensor along all of each of the in_channels
I thought that TFlearn's evaluate method returns the accuracy of the model (0 to 1) but after training my model model.evaluate(test_x, test_y) returns a value > 1 (1.003626), so now I'm not sure I understand exactly what it returns.
Can anyone explain?
The evaluate method returns a dict, so the call would be
model.evaluate(test_x, test_y)['accuracy']
but I'm guessing that's not the problem. If you are doing classification, the test labels have to be integers for this to work. Other than that, without seeing more of your code, it's hard to debug.
Comments from the source code for evaluate:
Args:
x: Matrix of shape [n_samples, n_features...] or dictionary of many matrices
containing the input samples for fitting the model. Can be iterator that returns
arrays of features or dictionary of array of features. If set,input_fnmust
beNone.
y: Vector or matrix [n_samples] or [n_samples, n_outputs] containing the
label values (class labels in classification, real numbers in
regression) or dictionary of multiple vectors/matrices. Can be iterator
that returns array of targets or dictionary of array of targets. If set,
input_fn must beNone. Note: For classification, label values must
be integers representing the class index (i.e. values from 0 to
n_classes-1).
At each iteration I want to dynamically provide how many placeholders I want and then will feed data to them. Is that possible and how ? I tried to create the whole model (placeholders, loss, optimizer) inside epoch loop but that gave uninitialised variables error.
At present I have n=5 placeholders each of shape=(1, k) in a list and I feed data to them. But n needs to dynamically defined during data feeding inside epoch loop.
Maybe you misunderstood what a tensor is.
If you think of a tensor like a multi-dimensional list, you can understand that having a dynamically number of placeholder with a shape [1, k] is no sense.
Instead, you have to use a single tensor.
Thus, define your input placeholder as a tensor with shape [None, 1, k].
placeholder_ = tf.placeholder(tf.float32, [None, 1, k])
With this statement you define a placeholder with tf.float32 type and an undefined number of elements (the None part) with shape [1,k].
In every iteration, you have to feed the placeholder with the right values. Eg running
result = sess.run(defined_op, feed_dict={
placeholder_: numpy_ndarray_with_N_elements_with_shape_1_k
})
In that way you don't need to define new variables into the computational graph (that simply doesn't work) but feed it with the desired values.