Error in Dimension for LSTM in tflearn - tensorflow

I am training PTB dataset for predicting characters (i.e. character-level LSTM).
The dimension for training batches is [len(dataset) x vocabulary_size]. Here, vocabulary_size = 27 (26+1[for unk tokens and spaces or fullstops.]).
This is the code for converting to one_hot for both batches input(arrX) and labels(arrY).
arrX = np.zeros((len(train_data), vocabulary_size), dtype=np.float32)
arrY = np.zeros((len(train_data)-1, vocabulary_size), dtype=np.float32)
for i, x in enumerate(train_data):
arrX[i, x] = 1
arrY = arrX[1, :]
I am making a placeholder of input(X) and labels(Y) in Graph to pass it to tflearn LSTM.Following is the code for the graph and session.
batch_size = 256
with tf.Graph().as_default():
X = tf.placeholder(shape=(None, vocabulary_size), dtype=tf.float32)
Y = tf.placeholder(shape=(None, vocabulary_size), dtype=tf.float32)
print (utils.get_incoming_shape(tf.concat(0, Y)))
print (utils.get_incoming_shape(X))
net = tflearn.lstm(X, 512, return_seq=True)
print (utils.get_incoming_shape(net))
net = tflearn.dropout(net, 0.5)
print (utils.get_incoming_shape(net))
net = tflearn.lstm(net, 256)
net = tflearn.fully_connected(net, vocabulary_size, activation='softmax')
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(net, Y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
init = tf.initialize_all_variables()
with tf.Session() as sess:
avg_cost = 0
total_batch = (train_length-1) / 256
print ("No. of batches:", '%d' %total_batch)
for i in range(total_batch) :
batch_xs, batch_ys = trainX[offset : batch_size + offset], trainY[offset : batch_size + offset], feed_dict={X: batch_xs, Y: batch_ys})
cost =, feed_dict={X: batch_xs, Y: batch_ys})
avg_cost += cost/total_batch
if i % 20 == 0:
print("Step:", '%03d' % i, "Loss:", str(cost))
offset += batch_size
SO, I get the following error assert ndim >= 3, "Input dim should be at least 3."
AssertionError: Input dim should be at least 3.
How can I resolve this error? Is there any alternate solution?
Should I write separate LSTM definition?

I'm not used to these kind of datasets but have you tried using the tflearn.input_data(shape) with the tflearn.embedding layer ? If you use the embedding I suppose that you won't have to reshape your data in 3 dimension.

lstm layer takes input of shape 3-D Tensor [samples, timesteps, input dim]. You can reshape your input data to 3D. In your problem shape of trainX is [len(dataset) x vocabulary_size]. Using trainX = trainX.reshape( trainX.shape+ (1,)) shape will be changed to [len(dataset), vocabulary_size, 1]. This data can be pass to lstm by simple change in input placeholder X. Add one more dimention to placeholder by X = tf.placeholder(shape=(None, vocabulary_size, 1), dtype=tf.float32).


Tensorflow error in feed_dict

I have the following error.
Cannot feed value of shape (525, 3) for Tensor 'Placeholder_31:0', which has shape '(?, 2)'
Here is my code:
data['K'] = data['K'].map({2: s, 4: ve})
#training data
#test data
#placeholders and variables. input has 4 features and output has 3 classes
#weight and bias
# model
#softmax function for multiclass classification
y = tf.nn.softmax(tf.matmul(x, W) + b)
#loss function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
#optimiser -
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)
#calculating accuracy of our model
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
#session parameters
sess = tf.InteractiveSession()
#initialising variables
init = tf.global_variables_initializer()
#number of interations
for step in range(epoch):
_,[train_step,cross_entropy], feed_dict={x: x_input, y_:[t for t in y_input.as_matrix()]})
if step%500==0 :
As i am new to tensorflow i cat figure out whats the mistake. Can anyone help me sorting it out?
You are declaring placeholder with the shape (?,2) with next line:
however you problem is classification with 3 classes. So you should change your y_,W and b to following:
Actually, you defined the wrong shape of your weight and bais so change the dimension of weight and bais according to your network architecture, it should be like this
y = tf.placeholder(tf.float32,shape=[None,number_of_classes])
w = tf.Variable(tf.zeros([input_tensor_shape,output_tensor_shape]))
b = tf.Variable(tf.zeros([number_of_classes]))

Error: Tensorflow BRNN logits and labels must be same size

I have an error like this:
InvalidArgumentError (see above for traceback): logits and labels must
be same size: logits_size=[10,9] labels_size=[7040,9] [[Node:
SoftmaxCrossEntropyWithLogits =
_device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, Reshape_1)]]
But I can't find the tensor which occurs this error.... I think it is appeared by size mismatching...
My Input size is batch_size * n_steps * n_input
so, It will be 10*704*100, And I want to make the output
batch_size * n_steps * n_classes => It will by 10*700*9, by Bidirectional RNN
How should I change this code to fix the error?
batch_size means the number of datas like this:
n_step means the length of each data ( The data was padded by 'O' to fix the length of each data) : 704
n_input means the data how to express the each alphabet in each data like this:
A - [1, 2, 1, -1, ..., -1]
And the output of the learning should be like this:
output of data 1 : XYZYXYZYYXY ...
output of data 10 : ZXYYRZYZZ ...
the each alphabet of output was effected by the surrounding and sequence of alphabet of input.
learning_rate = 0.001
training_iters = 100000
batch_size = 10
display_step = 10
# Network Parameters
n_input = 100
n_steps = 704 # timesteps
n_hidden = 50 # hidden layer num of features
n_classes = 9
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_steps, n_classes])
weights = {
'out': tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
biases = {
'out': tf.Variable(tf.random_normal([n_classes]))
def BiRNN(x, weights, biases):
x = tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
# Forward direction cell
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Get lstm cell output
outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
except Exception: # Old TensorFlow version only returns outputs not states
outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']
pred = BiRNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
step = 1
while step * batch_size < training_iters:
batch_x, batch_y = next_batch(batch_size, r_big_d, y_r_big_d)
#batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop), feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# Calculate batch accuracy
acc =, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
loss =, feed_dict={x: batch_x, y: batch_y})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
step += 1
print("Optimization Finished!")
test_x, test_y = next_batch(batch_size, v_big_d, y_v_big_d)
print("Testing Accuracy:", \, feed_dict={x: test_x, y: test_y}))
The first return value of static_bidirectional_rnn is a list of tensors - one for each rnn step. By using only the last one in your tf.matmul you're losing all the rest. Instead, stack them into a single tensor of the appropriate shape, reshape for the matmul then shape back.
outputs = tf.stack(outputs, axis=1)
outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden))
outputs = tf.matmul(outputs, weights['out']) + biases['out']
outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))
Alternatively, you could use tf.einsum:
outputs = tf.stack(outputs, axis=1)
outputs = tf.einsum('ijk,kl->ijl', outputs, weights['out']) + biases['out']

Outputting sequence in TensorFlow RNN

I created a simple TensorFlow program that tries to predict the next character using the previous 3 characters in a body of text.
A single input could look like this:
with the target about being
I'm trying to expand this to output the next say 4 character rather than just the next character. To do this I tried feeding in a longer array to y
In addition to changing the y to
y = tf.placeholder(dtype=tf.int32, shape=[None, n_steps])
however, this yields the error:
Rank mismatch: Rank of labels (received 2) should equal rank of logits
minus 1 (received 2).
Here's the full code
n_neurons = 200
n_output = vocab_size
learning_rate = 0.001
with tf.Graph().as_default():
x = tf.placeholder(dtype=tf.int32, shape=[None, n_steps])
y = tf.placeholder(dtype=tf.int32, shape=[None])
seq_length = tf.placeholder(tf.int32, [None])
# Let's set up the embedding converting words to vectors
embeddings = tf.Variable(tf.random_uniform(shape=[vocab_size, embedding_size], minval=-1, maxval=1))
train_input = tf.nn.embedding_lookup(embeddings, x)
basic_cell = tf.nn.rnn_cell.GRUCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, train_input, sequence_length=seq_length, dtype=tf.float32)
logits = tf.layers.dense(states, units=vocab_size, activation=None)
predictions = tf.nn.softmax(logits)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
with tf.Session() as sess:
for r in range(1000):
x_batch, y_batch, seq_length_batch = input_fn()
feed_dict = {x: x_batch, y: y_batch, seq_length: seq_length_batch}
_, loss_out =[training_op, loss], feed_dict=feed_dict)
if r % 1000 == 0:
print("loss_out", loss_out)
sample_text = "for th"
sample_text_ids = np.expand_dims(np.array([w_to_id[c] for c in sample_text]+[0, 0], dtype=np.int32), 0)
prediction_out =, feed_dict={x: sample_text_ids, seq_length: np.array([len(sample_text)])})
print("Result:", id_to_w[np.argmax(prediction_out)])
In case of many-to-many RNN, you should use tf.contrib.seq2seq.sequence_loss to calculate per time step loss. Your code should look like this:
logits = tf.layers.dense(states, units=vocab_size, activation=None)
weights = tf.sequence_mask(seq_length, n_steps)
xentropy = tf.contrib.seq2seq.sequence_loss(logits, y, weights)
See here for more details on tf.contrib.seq2seq.sequence_loss.

tensor flow character recognition with softmax results in accuracy 1 due to [NaN...NaN] prediction

I am trying to use the softmax regression method discussed in to recognize characters.
My code is as follows.
train_data = pd.read_csv('CharDataSet/train.csv')
x = tf.placeholder(tf.float32, [None, 130])
W = tf.Variable(tf.zeros([130, 26]))
b = tf.Variable(tf.zeros([26]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 26])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
for _ in range(10):
batch_xs = train_data.iloc[:, 2:]
batch_ys = getencodedbatch(train_data.iloc[:, 1])
print(batch_ys), feed_dict={x: batch_xs, y_: batch_ys})
However, I am getting an accuracy of 1, which shouldn't be the case.
The reason why I am getting it so is because my y tensor results with an array like
[nan, ..., nan]
Can anyone explain to me what is wrong in my code?
I converted each character to a one-hot encoding using the method below
def getencodedbatch(param):
s = (param.shape[0],26)
y_encode = np.zeros(s)
# print(y_encode)
for val in param:
col = ord(val)-97
y_encode[row, col] = 1
row += 1
return pd.DataFrame(y_encode)
Here is the problem you are having:
You set your initial weights and biases to 0 (this is wrong, as your
network does not learn).
The result is that y consists of all zeros
You take the log of y.. and a log of 0 is not defined... Hence the NaN.
Good luck!
Edit to tell you how to fix it: look for an example on classifying MNIST characters and see what they do. You probably want to initialise your weights to be random normals ;)

How to use RNN tensorflow to learning one-Dimensional Data? AttributeError: 'numpy.ndarray' object has no attribute 'batch'

The one-D data concludes 80 samples, with everyone is 1089 length. I want to use 70 samples to training and 10 samples to testing.
I am totally beginner in python and tensorflow, so I use the code which is processing image(which is two-dimension). Here is the code I use(all the parameters are pretty low for I just want to test the code):
import tensorflow as tf
import as sc
from tensorflow.python.ops import rnn, rnn_cell
# data read
feature_training = sc.loadmat("feature_training.mat")
feature_training = feature_training['feature_training']
print (feature_training.shape)
feature_testing = sc.loadmat("feature_testing.mat")
feature_testing = feature_testing['feature_testing']
print (feature_testing.shape)
label_training = sc.loadmat("label_training.mat")
label_training = label_training['label_training']
print (label_training.shape)
label_testing = sc.loadmat("label_testing.mat")
label_testing = label_testing['label_testing']
print (label_testing.shape)
# parameters
learning_rate = 0.1
training_iters = 100
batch_size = 70
display_step = 10
# network parameters
n_input = 70 # MNIST data input (img shape: 28*28)
n_steps = 100 # timesteps
n_hidden = 10 # hidden layer num of features
n_classes = 2 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
biases = {
'out': tf.Variable(tf.random_normal([n_classes]))
def RNN(x, weights, biases):
# Prepare data shape to match `rgnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.split(0, n_steps, x)
# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']
pred = RNN(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = feature_training.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop), feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# Calculate batch accuracy
acc =, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
# loss =, feed_dict={x: batch_x, y: batch_y})
print ("Iter " + str(step*batch_size) + ", Training Accuracy= " +
step += 1
print ("Optimization Finished!")
# Calculate accuracy for 10 testing data
test_len = 10
test_data = feature_testing[:test_len].reshape((-1, n_steps, n_input))
test_label = label_testing[:test_len]
print ("Testing Accuracy:",, feed_dict={x: test_data, y: test_label}))
At last, it turns out the Error:
Traceback (most recent call last):
File "/home/xiangzhang/MNIST data", line 92, in <module>
batch_x, batch_y = feature_training.batch(batch_size)
AttributeError: 'numpy.ndarray' object has no attribute 'next_batch'
I thought it must be related with the dimension of the data, but I do not know how to fix it. Please help me, thanks very much.