I am trying to create the inputs to an RNN cell which has takes
An input tensor x of dimensions batch_size x time_step x n_classes
An initial hidden state tensor h of dimensions batch_size x hidden_state_size
The problem now is that I am unsure how to use tf.get_variable(...) to create h such that it has a shape of ? x hidden_state_size where ? is the current batch size.
I don't have this problem with x as I am able to define x like so:
x = tf.nn.embedding_lookup(self.pretrained_embeddings, self.input_placeholder)
x = tf.reshape(x, (-1, self.max_length, Config.n_features * Config.embed_size))
So the batch_size is automatically inferred from tf.reshape
So x.get_shape() would give me (?, max_length, n_features * embed_size)
The only workaround I have to create h is the following:
# x_ is of shape (max_length, batch_size, n_features * embed_size)
x_ = tf.unstack(x, axis = 1)
# h is of shape (batch_size, n_features)
h = tf.get_variable("h", tf.shape(x_[0]), initializer =
tf.constant_initializer(0.0))
Then h.get_shape() would give the desired (?, n_features * embed_size) which by chance is equal to my hidden_state_size. The problem with this workaround is that this only works if the hidden_state_size is equal to n_features * embed_size which may not always be the case.
Is there a way such that I can define the hidden tensor h so that it can have a shape of (?, hidden_state_size) without the error:
ValueError: Shape of a new variable (pred/h) must be fully defined,
but instead was (?, 300)
Silly me I could have just done something like:
h = tf.zeros(shape = (tf.shape(x)[0], n_features * embed_size))
Related
I am trying to do word prediction using basic RNN. I need to provide input to the RNN cell; I am trying following code
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, sequence_length, 1)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
output = tf.transpose(output, (1,0,2))
output = tf.reshape(output, (sequence_length*num_samples,hidden_layer_size))
I am getting error ValueError: Layer gru_cell_2 expects 1 inputs, but it received 39 input tensors. I think this error is due to the embedding as that is not giving a tensor of dimension which can be input to the GRUCell. So, How to provide the input to GRU Cell?
The way you're initializing X_input is probably wrong. That extra one dimension is causing the problem. If you remove that then there's no need to use unstack. This following code would work.
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
##shape of output here is (None,sequence_length,hidden_layer_size)
But if you really need to use that dimension then you need to make a small modification in unstack. You're unstacking it along axis=1 into sequence_length number of tensors, which again doesn't seem right. So do this:
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, 1, 2)
output, states = tf.nn.dynamic_rnn(rnn, x[0], dtype = tf.float32)
##shape of output here is again same (None,sequence_length,hidden_layer_size)
Lastly if you really really need to unstack it in sequence_length number of tensors then replace unstack with tf.map_fn() and do this:
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.transpose(x,[1,0,2,3])
##tf.map_fn unstacks a tensor along the first dimension only so we need to make seq_len as first dimension by taking transpose
output,states = tf.map_fn(lambda x: tf.nn.dynamic_rnn(rnn,x,dtype=tf.float32),x,dtype=(tf.float32, tf.float32))
##shape of output here is (sequence_length,None,1,hidden_layer_size)
A warning: Notice the shape of the output in each solution. be wary of what type of shape you want.
EDIT:
To answer your question about when to use what type of inputs:
Suppose you have 25 sentences, each has 15 words and you divided it into 5 batches of size 5 each. Also, suppose you're using word embedding of 50 dimensions(let's say u are using word2vec), then your input shape would be (batch_size=5,time_step=15, features=50). In this case, you don't need to use unstacking or any kind of mapping.
Next, suppose you have 30 documents, each has 25 sentences, each sentence 15 words long, and you divided documents into 6 batches of size 5 each. Again, suppose you're using word embedding of 50 dimensions, then your input shape has now one extra dimension. Here batch_size=5,time_step=15 and features=50 but what about number of sentences? Now your input is (batch_size=5,num_sentences=25,time_step=15, features=50) which is a invalid shape for any type of RNNs. In that case, you need to unstack it along the sentence dimension to make 25 tensors, each will have the shape (5,15,50). To make that work, I used tf.map_fn.
I'm trying to build an RNN with LSTM on TensorFlow. Both the input and output are 5000 by 2 matrices, where the columns represent the features. Those matrices are then fed to the batchX and batchY placeholders which enable the backpropagation. The main definition of the code is at the bottom. I am getting the following error :
"Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2)."
I have checked both logits_series and labels_series and they seem to both contain backpropagation amount of tensors of the shape of [batch_size, num_features]
The thing I am confused about is the following: since logits are predictions of labels, shouldn't they have the same dimensions?
'''
RNN definitions
input_dimensions = [batch_size, truncated_backprop_length, num_features_input]
output_dimensions = [batch_size, truncated_backprop_length, num_features_output]
state_dimensions = [batch_size, state_size]
'''
batchX_placeholder = tf.placeholder(tf.float32, (batch_size, truncated_backprop_length, num_features_input))
batchY_placeholder = tf.placeholder(tf.int32, (batch_size, truncated_backprop_length, num_features_output))
init_state = tf.placeholder(tf.float32, (batch_size, state_size))
inputs_series = tf.unstack(batchX_placeholder, axis=1)
labels_series = tf.unstack(batchY_placeholder, axis=1)
w = tf.Variable(np.random.rand(num_features_input+state_size,state_size), dtype = tf.float32)
b = tf.Variable(np.zeros((batch_size, state_size)), dtype = tf.float32)
w2 = tf.Variable(np.random.rand(state_size, num_features_output), dtype = tf.float32)
b2 = tf.Variable(np.zeros((batch_size, num_features_output)), dtype=tf.float32)
#calculate state and output variables
state_series = []
output_series = []
current_state = init_state
#iterate over each truncated_backprop_length
for current_input in inputs_series:
current_input = tf.reshape(current_input,[batch_size, num_features_input])
input_and_state_concatenated = tf.concat([current_input,current_state], 1)
next_state = tf.tanh(tf.matmul(input_and_state_concatenated, w) + b)
state_series.append(next_state)
current_state = next_state
output = tf.matmul(current_state, w2)+b2
output_series.append(output)
#calculate expected output for each state
logits_series = [tf.matmul(state, w2) + b2 for state in state_series]
#print(logits_series)
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]
'''
batchY_placeholder = np.zeros((batch_size,truncated_backprop_length))
for i in range(batch_size):
for j in range(truncated_backprop_length):
batchY_placeholder[i,j] = batchY1_placeholder[j, i, 0]+batchY1_placeholder[j, i, 1]
'''
print("logits_series", logits_series)
print("labels_series", labels_series)
#calculate losses given each actual and calculated output
losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits = logits, labels = labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)
Thanks to Maosi Chen, I found the issue. It was because the
tf.nn.sparse_softmax_cross_entropy_with_logits
Requires labels to have one less dimension than logits. Specifically, the labels argument takes values of the shape [batch_size] and the dtype int32 or int64
I solved the issue by enumerating the one hot encoded labels I had, reducing the dimension
However, it was also possible to use
tf.nn.softmax_cross_entropy_with_logits
Which does not have the dimension reduction requirement, as it takes labels values with shape [batch_size, num_classes] and dtype float32 or float64.
I have lately been vexed by the following error message:
ValueError: Cannot feed value of shape (2455040,) for Tensor 'Placeholder:0', which has shape '(2455040, ?)'
Which is being produced from running the following code:
NUMCLASSES=16
NUMPIXELS=959*640*4
# set up to feed an array of images [images, size_of_image]
x = tf.placeholder(tf.float32, [NUMPIXELS,None])
....deletia....
# Define loss and optimizer..why is this 2d?
y_ = tf.placeholder(tf.float32, [None,NUMCLASSES])
sess = tf.InteractiveSession()
tf.global_variables_initializer().run(session=sess)
tl = get_tensor_list()
for f, n in tl:
str = '/users/me/downloads/train/' + f
mm = Image.open(str)
mm = mm.convert('F')
mma=np.array(mm)
i = mma.flatten() #now this is an array of floats of size NUMPIXELS
sess.run(train_step, feed_dict={x: i, y_: n}) # <<DEATH
Somehow, that array is getting a shape that tf does not like [(x,) when it wants (x,?)]. How to satisfy the tensorgods in this case? The tensor must be what it must be for other mathematical reasons not discussed.
reshaping the array might help.
i = mma.flatten().reshape((NUMPIXELS,1))
The error happens because the two tensors have different ranks: tensor with shape (2455040,) has rank 1, while tensor with shape (2455040,?) has rank 2.
You can do this:
x = tf.placeholder(tf.float32, [None])
x = tf.reshape(x, [NUMPIXELS,-1])
Below codes show me "inputs must be a list". at this.
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
When I define placeholder for input x. I have already set a shape as [None,None]. I think this shape is 2-dimensional array. However, the code continuously requires list type of x.
Below, I have attached all of my codes before training. And this codes are inserted into function of class.
x = tf.placeholder("float",[None,None])
y = tf.placeholder("float",[None])
lstm_cell = rnn_cell.BasicLSTMCell(self.n_hidden, forget_bias=1.0)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
pred = tf.matmul(outpus[-1], self.weights['out']) + self.biases['out']
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
init = tf.initialize_all_variables()
self.sess = tf.Session()
self.sess.run(init)
Additionally, practical inputs will be float of word sequence and float of label formed as x=[["aaa","aaa","aaa"],["bbb","bbb"]], y=["c1","c2"].
At that, the first element array of x is labeled with "c1" and the second is "c2". Especially, size of each element array of x cannot be deterministic.
As stated by the documentation, the argument inputs of the function tf.nn.rnn() is:
inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.
In your code, the argument inputs is x, a Tensor placeholder of shape [None, None]. In order for your code to work, x must be a list of T tensors of shape [None, input_lenght].
The following code generates a list of tensors as inputs and therefore the function tf.nn.rnn works.
import tensorflow as tf
x = tf.placeholder("float",[None,16])
y = tf.placeholder("float",[None])
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(256, forget_bias=1.0)
inputs = []
for t in range(10):
inputs.append(x)
print(len(inputs))
outputs, states = tf.nn.rnn(lstm_cell, inputs, dtype=tf.float32)
pred = tf.matmul(outputs[-1], self.weights['out']) + self.biases['out']
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
init = tf.initialize_all_variables()
self.sess = tf.Session()
self.sess.run(init)
Note how the placeholder x has a defined shape of [None, input_shape]. It won't work with a shape [None, None] because the first dimensions is the batch_size, which can be None, but the second dimension is the size of each item in the input sequence, and that value can't be None.
assuming the input to the network is a placeholder with variable batch size, i.e.:
x = tf.placeholder(..., shape=[None, ...])
is it possible to get the shape of x after it has been fed? tf.shape(x)[0] still returns None.
If x has a variable batch size, the only way to get the actual shape is to use the tf.shape() operator. This operator returns a symbolic value in a tf.Tensor, so it can be used as the input to other TensorFlow operations, but to get a concrete Python value for the shape, you need to pass it to Session.run().
x = tf.placeholder(..., shape=[None, ...])
batch_size = tf.shape(x)[0] # Returns a scalar `tf.Tensor`
print x.get_shape()[0] # ==> "?"
# You can use `batch_size` as an argument to other operators.
some_other_tensor = ...
some_other_tensor_reshaped = tf.reshape(some_other_tensor, [batch_size, 32, 32])
# To get the value, however, you need to call `Session.run()`.
sess = tf.Session()
x_val = np.random.rand(37, 100, 100)
batch_size_val = sess.run(batch_size, {x: x_val})
print x_val # ==> "37"
You can get the shape of the tensor x using x.get_shape().as_list(). For getting the first dimension (batch size) you can use x.get_shape().as_list()[0].