Tensorflow MultiRNNcell - tensorflow

I have a following problem with MultiRNNcell, but first things first.
My data consist looks like following:
[[a1, b2,..., x200], [b1, b2, ..., b200], ...]
The relevant code is here:
rows, row_size = 20, 10
num_classes = 3
batch_size = 128
hidden_layer_size = 256
n_layers = 4
tf_x = tf.placeholder(tf.float32, [None, rows, row_size])
tf_y = tf.placeholder(tf.float32, [None, num_classes])
in_x = tf.unstack(input_x, axis=1)
network = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.BasicLSTMCell(hidden_layer_size, state_is_tuple=True)
for _ in range(n_layers)], state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(cell=network, inputs=in_x, dtype=tf.float32)
outputs = tf.matmul(outputs[-1], layer["weights"]) + layer["biases"]
...
...
x_feed = np.array(x_feed.reshape((batch_size, rows, row_size)))
_, c = sess.run([optimizer, loss_fn], feed_dict={tf_x: x_feed, tf_y: y_feed})
I am getting an error ValueError: Shape (10, ?) must have rank at least 3
and traceback is showing on line
outputs, states = rnn.dynamic_rnn(cell=network, inputs=in_x, dtype=tf.float32)
outputs, states = rnn.static_rnn(cell=network, inputs=x3, dtype=tf.float32)
In case, when I use static_rnn instead of dynamic_rnn everything is running fine, but I don't know what I am doing wrong. How to use dynamic_rnn in this case?

Related

How to do word embedding to provide input to RNN?

I am trying to do word prediction using basic RNN. I need to provide input to the RNN cell; I am trying following code
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, sequence_length, 1)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
output = tf.transpose(output, (1,0,2))
output = tf.reshape(output, (sequence_length*num_samples,hidden_layer_size))
I am getting error ValueError: Layer gru_cell_2 expects 1 inputs, but it received 39 input tensors. I think this error is due to the embedding as that is not giving a tensor of dimension which can be input to the GRUCell. So, How to provide the input to GRU Cell?
The way you're initializing X_input is probably wrong. That extra one dimension is causing the problem. If you remove that then there's no need to use unstack. This following code would work.
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
##shape of output here is (None,sequence_length,hidden_layer_size)
But if you really need to use that dimension then you need to make a small modification in unstack. You're unstacking it along axis=1 into sequence_length number of tensors, which again doesn't seem right. So do this:
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, 1, 2)
output, states = tf.nn.dynamic_rnn(rnn, x[0], dtype = tf.float32)
##shape of output here is again same (None,sequence_length,hidden_layer_size)
Lastly if you really really need to unstack it in sequence_length number of tensors then replace unstack with tf.map_fn() and do this:
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.transpose(x,[1,0,2,3])
##tf.map_fn unstacks a tensor along the first dimension only so we need to make seq_len as first dimension by taking transpose
output,states = tf.map_fn(lambda x: tf.nn.dynamic_rnn(rnn,x,dtype=tf.float32),x,dtype=(tf.float32, tf.float32))
##shape of output here is (sequence_length,None,1,hidden_layer_size)
A warning: Notice the shape of the output in each solution. be wary of what type of shape you want.
EDIT:
To answer your question about when to use what type of inputs:
Suppose you have 25 sentences, each has 15 words and you divided it into 5 batches of size 5 each. Also, suppose you're using word embedding of 50 dimensions(let's say u are using word2vec), then your input shape would be (batch_size=5,time_step=15, features=50). In this case, you don't need to use unstacking or any kind of mapping.
Next, suppose you have 30 documents, each has 25 sentences, each sentence 15 words long, and you divided documents into 6 batches of size 5 each. Again, suppose you're using word embedding of 50 dimensions, then your input shape has now one extra dimension. Here batch_size=5,time_step=15 and features=50 but what about number of sentences? Now your input is (batch_size=5,num_sentences=25,time_step=15, features=50) which is a invalid shape for any type of RNNs. In that case, you need to unstack it along the sentence dimension to make 25 tensors, each will have the shape (5,15,50). To make that work, I used tf.map_fn.

TensorFlow sparse_softmax_cross_entropy rank error

I'm trying to build an RNN with LSTM on TensorFlow. Both the input and output are 5000 by 2 matrices, where the columns represent the features. Those matrices are then fed to the batchX and batchY placeholders which enable the backpropagation. The main definition of the code is at the bottom. I am getting the following error :
"Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2)."
I have checked both logits_series and labels_series and they seem to both contain backpropagation amount of tensors of the shape of [batch_size, num_features]
The thing I am confused about is the following: since logits are predictions of labels, shouldn't they have the same dimensions?
'''
RNN definitions
input_dimensions = [batch_size, truncated_backprop_length, num_features_input]
output_dimensions = [batch_size, truncated_backprop_length, num_features_output]
state_dimensions = [batch_size, state_size]
'''
batchX_placeholder = tf.placeholder(tf.float32, (batch_size, truncated_backprop_length, num_features_input))
batchY_placeholder = tf.placeholder(tf.int32, (batch_size, truncated_backprop_length, num_features_output))
init_state = tf.placeholder(tf.float32, (batch_size, state_size))
inputs_series = tf.unstack(batchX_placeholder, axis=1)
labels_series = tf.unstack(batchY_placeholder, axis=1)
w = tf.Variable(np.random.rand(num_features_input+state_size,state_size), dtype = tf.float32)
b = tf.Variable(np.zeros((batch_size, state_size)), dtype = tf.float32)
w2 = tf.Variable(np.random.rand(state_size, num_features_output), dtype = tf.float32)
b2 = tf.Variable(np.zeros((batch_size, num_features_output)), dtype=tf.float32)
#calculate state and output variables
state_series = []
output_series = []
current_state = init_state
#iterate over each truncated_backprop_length
for current_input in inputs_series:
current_input = tf.reshape(current_input,[batch_size, num_features_input])
input_and_state_concatenated = tf.concat([current_input,current_state], 1)
next_state = tf.tanh(tf.matmul(input_and_state_concatenated, w) + b)
state_series.append(next_state)
current_state = next_state
output = tf.matmul(current_state, w2)+b2
output_series.append(output)
#calculate expected output for each state
logits_series = [tf.matmul(state, w2) + b2 for state in state_series]
#print(logits_series)
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]
'''
batchY_placeholder = np.zeros((batch_size,truncated_backprop_length))
for i in range(batch_size):
for j in range(truncated_backprop_length):
batchY_placeholder[i,j] = batchY1_placeholder[j, i, 0]+batchY1_placeholder[j, i, 1]
'''
print("logits_series", logits_series)
print("labels_series", labels_series)
#calculate losses given each actual and calculated output
losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits = logits, labels = labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)
Thanks to Maosi Chen, I found the issue. It was because the
tf.nn.sparse_softmax_cross_entropy_with_logits
Requires labels to have one less dimension than logits. Specifically, the labels argument takes values of the shape [batch_size] and the dtype int32 or int64
I solved the issue by enumerating the one hot encoded labels I had, reducing the dimension
However, it was also possible to use
tf.nn.softmax_cross_entropy_with_logits
Which does not have the dimension reduction requirement, as it takes labels values with shape [batch_size, num_classes] and dtype float32 or float64.

Why when I changed the test batch size in tensorflow, result was different

Here is my train code:
x = tf.placeholder(tf.float32, [None, 2, 3])
cell = tf.nn.rnn_cell.GRUCell(10)
_, state = tf.nn.dynamic_rnn(
cell = cell,
inputs = x,
dtype = tf.float32)
# train
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
x_ = np.ones([2,2,3],np.float32)
output = sess.run(state, feed_dict= {x: x_})
print output
saver = tf.train.Saver()
saver.save(sess,'./model')
The result is:
[[ 0.12851571 -0.23994535 0.23123585 -0.00047993 -0.02450397
-0.21048039 -0.18786618 0.04458345 -0.08603278 -0.08259721]
[ 0.12851571 -0.23994535 0.23123585 -0.00047993 -0.02450397
-0.21048039 -0.18786618 0.04458345 -0.08603278 -0.08259721]]
Here is my test code:
x = tf.placeholder(tf.float32, [None, 2, 3])
cell = tf.nn.rnn_cell.GRUCell(10)
_, state = tf.nn.dynamic_rnn(
cell = cell,
inputs = x,
dtype = tf.float32)
with tf.Session() as sess:
x_ = np.ones([1,2,3],np.float32)
saver = tf.train.Saver()
saver.restore(sess,'./model')
output = sess.run(state, feed_dict= {x: x_})
print output
Then I get:
[[ 0.12851571 -0.23994535 0.2312358 -0.00047993 -0.02450397
-0.21048039 -0.18786621 0.04458345 -0.08603278 -0.08259721]]
You see, result has changed slightly. When I set the test batch to 2, the result is same as train result. So what's wrong? My tf version is 0.12
An update (not an answer)
The tf.nn.rnn_cell.GRUCell and tf.nn.dynamic_rnn are both deprecated and replaced with tf.keras.layers.GRU.
Using the deprecated functions, it appears you don't even need to save and restore the model or even run it multiple times. All you need is to run it on an odd batch size and use tf.float32 as the dtype and the last result will be slightly off.
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, [None, 2, 3])
cell = tf.nn.rnn_cell.GRUCell(10)
_, state = tf.nn.dynamic_rnn(
cell = cell,
inputs = x,
dtype = tf.float32)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
x_ = np.ones([3,2,3],np.float32)
output = sess.run(state, feed_dict= {x: x_})
print(output)
Returns results like this
[[ 0.03649516 -0.08052824 -0.0539998 0.2995336 -0.12542574 -0.04339318
0.3872745 0.08844283 -0.14555818 -0.4216033 ]
[ 0.03649516 -0.08052824 -0.0539998 0.2995336 -0.12542574 -0.04339318
0.3872745 0.08844283 -0.14555818 -0.4216033 ]
[ 0.03649516 -0.08052824 -0.05399981 0.2995336 -0.12542574 -0.04339318
0.38727456 0.08844285 -0.14555818 -0.4216033 ]]
The anomaly only seems to appear in the last row for odd length batches.
An alternative view is, that a single batch is correct, and all even sized batches are off and everything other than the last row of odd sized batches is off.
It does not seem to happen for dtype=float64 or dtype=float16, both of which seem stable.
Furthermore, this issue is only in the hidden state and does not seem to appear in the regular output.

LSTM model error is percent of one output class

I'm having a rough time trying to figure out what's wrong with my LSTM model. I have 11 inputs, and 2 output classes (one-hot encoded) and very quickly, like within 1 batch or so, the error just goes to the % of one of the output classes and stays there.
I tried printing weights and biases, but they seem to all be full of NaN.
If i decrease the learning rate, or mess around with layers/units, I can get it to arrive at the % of one class error slowly, but it seems to always get to that point.
Here's the code:
num_units = 30
num_layers = 50
dropout_rate = 0.80
learning_rate=0.0001
batch_size = 180
epoch = 1
input_classes = len(train_input[0])
output_classes = len(train_output[0])
data = tf.placeholder(tf.float32, [None, input_classes, 1]) #Number of examples, number of input, dimension of each input
target = tf.placeholder(tf.float32, [None, output_classes]) #one-hot encoded: [1,0] = bad, [0,1] = good
dropout = tf.placeholder(tf.float32)
cell = tf.contrib.rnn.LSTMCell(num_units, state_is_tuple=True)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
#Input shape [batch_size, max_time, depth], output shape: [batch_size, max_time, cell.output_size]
val, _ = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
val = tf.transpose(val, [1, 0, 2]) #reshapes it to [sequence_size, batch_size, depth]
#get last entry as it includes previous results
last = tf.gather(val, int(val.get_shape()[0]) - 1)
weight = tf.get_variable("W", shape=[num_units, output_classes], initializer=tf.contrib.layers.xavier_initializer())
bias = tf.get_variable("B", shape=[output_classes], initializer=tf.contrib.layers.xavier_initializer())
logits = tf.matmul(last, weight) + bias
prediction = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=target)
prediction = tf.clip_by_value(prediction, 1e-10,100.0)
cost = tf.reduce_mean(prediction)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
minimize = optimizer.minimize(cost)
mistakes = tf.not_equal(tf.argmax(target, 1), tf.argmax(logits, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(init_op)
no_of_batches = int((len(train_input)) / batch_size)
for i in range(epoch):
ptr = 0
for j in range(no_of_batches):
inp, out = train_input[ptr:ptr+batch_size], train_output[ptr:ptr+batch_size]
ptr+=batch_size
sess.run(minimize,{data: inp, target: out, dropout: dropout_rate })
sess.close()
Since you have one hot encoding use sparse_softmax_cross_entropy_with_logits instead of tf.nn.softmax_cross_entropy_with_logits.
Refer to this stackoverflow answer to understand the difference of two functions.
1

Tensorflow: variable batch_size gives error when trying to predict with eval (Dimensions of inputs should match)

I am training a model with variable batch_size (first batches are 200). So I have used batch_size None to make it variable (I couldn't do that for init_state because it gave an error).
x = tf.placeholder(tf.int32, [None, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [None, num_steps], name='labels_placeholder')
init_state = tf.zeros([batch_size, state_size])
rnn_inputs = tf.one_hot(x, num_classes)
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
logits = tf.reshape(
tf.matmul(tf.reshape(rnn_outputs, [-1, state_size]), W) + b,
[batch_size, num_steps, num_classes])
predictions = tf.nn.softmax(logits)
Training the model goes well.
Then I try to predict probabilities with an x of shape (1, 10) instead of (200, 10):
I tried:
test = np.array([[1, 2 , 3 , 4, 5, 6 ,7, 8, 9 ,10]], dtype=np.int32)
print (predictions.eval(feed_dict = {x:test}))
I also tried in a sligly different way with:
preds, state = sess.run([g['preds'],g['final_state']], feed_dict)
Same error:
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,1058] vs. shape[1] = [200,4]
[[Node: rnn/while/basic_rnn_cell/basic_rnn_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](rnn/while/TensorArrayReadV3, rnn/while/Identity_2, rnn/while/basic_rnn_cell/basic_rnn_cell/concat/axis)]]
So 1058 is my num_classes, 200 is the (initial) batch_size and 4 is the width of the tensor.
I think I am not using the variable batch_size correctly. Any ideas on what to change?