New to python and deep learning. I was trying to build an RNN with some data and I don't know where am I going wrong.
This is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
raw = pd.read_excel('Online Retail.xlsx',index_col='InvoiceDate')
sales = raw.drop(['InvoiceNo','StockCode','Country','Description'],axis=1)
sales.index = pd.to_datetime(sales.index)
train_set = sales.head(50000)
test_set = sales.tail(41909)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
training = np.nan_to_num(train_set)
testing = np.nan_to_num(test_set)
train_scaled = scaler.fit_transform(training)
test_scaled = scaler.fit_transform(testing)
def next_batch(training_data,batch_size,steps):
rand_start = np.random.randint(0,len(training_data)-steps)
y_batch =
import tensorflow as tf
num_inputs = 1
num_time_steps = 10
num_neurons = 100
num_outputs = 1
learning_rate = 0.03
num_train_iterations = 4000
batch_size = 1
X = tf.placeholder(tf.float32,[None,num_time_steps,num_inputs])
y = tf.placeholder(tf.float32,[None,num_time_steps,num_outputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
loss = tf.reduce_mean(tf.square(outputs - y)) # MSE
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session(config=tf.ConfigProto()) as sess:
for iteration in range(num_train_iterations):
X_batch, y_batch = next_batch(train_scaled,batch_size,num_time_steps), feed_dict={X: X_batch, y: y_batch})
if iteration % 100 == 0:
mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
print(iteration, "\tMSE:", mse)
# Save Model for Later, "./ex_time_series_model")
The output:
ValueError Traceback (most recent call last)
<ipython-input-36-f2f7c66a33df> in <module>()
4 for iteration in range(num_train_iterations):
----> 6 X_batch, y_batch = next_batch(train_scaled,batch_size,num_time_steps)
7, feed_dict={X: X_batch, y: y_batch})
<ipython-input-26-f673a469c67d> in next_batch(training_data, batch_size, steps)
1 def next_batch(training_data,batch_size,steps):
2 rand_start = np.random.randint(0,len(training_data)-steps)
----> 3 y_batch = np.array(training_data[rand_start:rand_start+steps+1].reshape(26,steps+1))
4 return y_batch[:,:-1].reshape(-1,steps,1),y_batch[:,1:].reshape(-1,steps,1)
ValueError: cannot reshape array of size 33 into shape (26,11)
In [ ]:

I'm not sure where the number 26 came from, but it doesn't match with your data dimensions. After you dropped four columns, the training_data array is (50000, 3), of which you take (11, 3) batches. This array obviously can't reshape to (26, 11).
What you probably meant is this (in next_batch function):
y_batch = np.array(training_data[rand_start:rand_start+steps+1].reshape(3,steps+1))

The error says that you trying to reshape a tensor with size 33 into a tensor with size 26x11, which you can't. You should reshape a tensor with size 286 into 26x11.
Try to debug the next_batch function by printing the y_batch shape in each step using print (y_batch.get_shape()) and check it, if it has shape 286.
I didn't catch this point, why you fetch each batch randomly? why didn't you read input data normally?
It would be good if you fix the indents when you posting your code, it is hard to track.


Learning a simple pattern with RNN

I am trying to make RNN in tensorflow capture a basic pattern in a simple time series in hours. I am trying to solve a bigger problem involving count time series of customer demand.
The simple time series is as follows:
Every 24 hours (1 day) there will be a small integer number either 1 or 2 from a random uniform distirbution.
In between these 24 hours will be zero values.
Every 168 hours (7 days) there will be a high integer number (5 or 6 or 7 or 8 or 9) from a random uniform distirbution.
I tried following the code at using dynamic_rnn.
Is my test data correct? How can I feed the batches of output from previous times step as input to the next time step? I have 5 hyperparamters to play with
batch_size = 8 num_steps = 192 state_size = 5 learning_rate = 0.00001
However, after training each time with the same hyperparameters I am getting different results. Each time the training error is very small. The different results seem quite random (local minima probably??). orange is actual, blue is predicted.
Can my test batch start at any point in the sequence? Does the RNN learn the number of zeros inbetween non-zero values? if the test batch starts with a small non-zero number then the RNN should know that it should output 23 zero value steps after this and then after 167 steps output a high non-zero value. if I start my test sequence at 0 then it should wait 23 more zero value steps before outputing a small non-zero value and after 167 steps output a high non-zero value?
or does it learn another pattern? I am not sure if my method of testing is correct?
Is it better to just pass one time step integer value and let the network generate the remaining time steps integer values by passing the current time step output as input to the next time step?
Currently, I just take a random sequence of X generated by the same method for training and check if my output Y is the shifted version of X by 1 time step. Could you please explain?
My code is given below. you can just copy and paste and it should run. Basically, I just generate the data, build the model, train the network and test it.
from data_generator import gen_data
import tensorflow as tf
import numpy as np
import time
import matplotlib.pyplot as plt
num_classes = 11
batch_size = 8
num_steps = 192
state_size = 5
learning_rate = 0.00001
dem = gen_data(len=1576)
def gen_batch(dem, batch_size, num_steps):
raw_x = dem[:-1]
raw_y = dem[1:]
data_length = len(raw_x)
num_of_win = data_length - num_steps - 1 # 1382 windows
batch_partition_length = num_of_win // batch_size # 172 batches
data_x = []
data_y = []
for i in range(batch_partition_length):
windows_x = []
windows_y = []
windows_x.append( raw_x[ j:num_steps + j] )
windows_y.append( raw_y[ j:num_steps + j] )
data_x.append(np.array(windows_x)) # each batch is stacked horizontally.
for windows_x, windows_y in zip(data_x,data_x):
x = windows_x
y = windows_y
z = x.shape
z = y.shape
yield (x, y)
def gen_epoch(num_epochs,batch_size, num_steps):
for n in range(num_epochs):
yield gen_batch(dem, batch_size, num_steps)
def reset_graph():
# if 'sess' in globals() and sess:
# sess.close()
def build_RNN_model(batch_size, num_classes,state_size,num_steps,learning_rate):
x = tf.compat.v1.placeholder(dtype=tf.int32, shape=(batch_size,num_steps))
y = tf.compat.v1.placeholder(dtype=tf.int32, shape=(batch_size,num_steps))
init_state = tf.zeros([batch_size, state_size])
# with tf.compat.v1.variable_scope('rnn_cell'):
# W = tf.compat.v1.get_variable('inp_state_w', shape=(num_classes+state_size,state_size),initializer=tf.compat.v1.initializers.glorot_uniform(10) )
# b = tf.compat.v1.get_variable('inp_state_b', shape=(state_size),initializer=tf.compat.v1.initializers.constant(0.0) )
# def rnn_cell(rnn_input,state):
# with tf.compat.v1.variable_scope('rnn_cell', reuse=True):
# W = tf.compat.v1.get_variable('inp_state_w', shape=(num_classes+state_size,state_size),initializer=tf.compat.v1.initializers.glorot_uniform(10) )
# b = tf.compat.v1.get_variable('inp_state_b', shape=(state_size),initializer=tf.compat.v1.initializers.constant(0.0) )
# return tf.tanh( tf.matmul( tf.concat([rnn_input,state], axis=1),W) + b )
#cell = tf.compat.v1.nn.rnn_cell.BasicRNNCell(state_size, reuse=True, name='rnn_cell' )
rnn_inputs = tf.one_hot(x, num_classes)
cell = tf.compat.v1.nn.rnn_cell.BasicRNNCell(state_size)
rnn_outputs, final_state = tf.compat.v1.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)
with tf.compat.v1.variable_scope('output'):
W = tf.compat.v1.get_variable('out_state_w', shape=(state_size,num_classes),initializer=tf.compat.v1.initializers.glorot_uniform(10) )
b = tf.compat.v1.get_variable('out_state_b', shape=(num_classes),initializer=tf.compat.v1.initializers.constant(0.0) )
logits = tf.reshape( tf.compat.v1.matmul(tf.reshape(rnn_outputs, [-1, state_size]), W) + b, [batch_size, num_steps, num_classes])
predictions = tf.compat.v1.nn.softmax(logits)
tru_labels = y
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
total_loss = tf.reduce_mean(losses)
train_step = tf.compat.v1.train.AdagradOptimizer(learning_rate).minimize(total_loss)
return dict(
final_state = final_state,
total_loss = total_loss,
train_step = train_step,
init_state = init_state,
predictions = predictions,
tru_labels = tru_labels,
saver = tf.compat.v1.train.Saver()
def train_network(g,num_epochs, batch_size,num_steps, dem,save=' '):
with tf.compat.v1.Session() as sess:
training_losses = []
for idx, epoch in enumerate(gen_epoch(num_epochs,batch_size, num_steps)):
training_loss = 0
steps=0 # number of batches
training_state = None
for X,Y in epoch:
feed_dict = {g['x'] : X, g['y'] : Y}
if training_state is not None:
feed_dict[g['init_state']] = training_state
training_loss_, training_state, train_step = \[g['total_loss'], g['final_state'], g['train_step']], feed_dict)
print("Average training loss for Epoch", idx, ":", training_loss/steps)
if isinstance(save, str):
g['saver'].save(sess, save)
e = gen_batch(dem, batch_size, num_steps)
e = gen_batch(dem, batch_size, num_steps)
for X,Y in e:
tru_labels, predictions = \[g['tru_labels'], g['predictions']], feed_dict={g['x'] : X, g['y'] : Y, g['init_state'] : training_state})
pred = np.argmax(predictions, axis=2)
pred = pred[0]
tru_labels = tru_labels[0]
print('tru_labels',tru_labels )
return training_loss
g = build_RNN_model(batch_size, num_classes,state_size,num_steps,learning_rate)
t = time.time()
train_network(g, num_epochs,batch_size,num_steps, dem,save='saver' )
print("It took", time.time() - t, "seconds to train for 3 epochs.")
I have written some keras code with a single RNN cell and a dense layer to capture the following two patterns which is similar to the two patterns above. However, the distribution of magnitudes of high vehicles and low vehicles that are drawn from a categorical distribution below are not being represented in the test output.
Categorical Random Variable, x = {0,1,2} and p(x) = {0.6,0.3,0.1}
low vehicles = 1 + x , every 4 hours
high vehicles = 6 + x , every 8 hours
I managed to get the results like the following
with this code
from copyreg import pickle
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow.keras as keras
import sys
#### for reproduclvle resutls
from numpy.random import seed
import tensorflow
n_steps = 12
batch_size = 32
lay1_state_size = 64
lay2_state_size = 0
dense_state_size = 1
num_epochs = 25
horizon = 24
loss_function_type = 'sparse_categorical_crossentropy or mse or rmse'
num_layers = 1
optimizer_type = 'Adam'
metrics = 'rmse'
# spikes at regrular interval
dem = np.load('const_dem_2_freq_stoch.npy')
dem_len = len(dem)
def gen_batch(dem, batch_size, n_steps):
n = n_steps + 1
raw_x = dem[:-1]
data_length = len(raw_x)
num_of_win = data_length - n - 1 # 1382 windows
batch_partition_length = num_of_win // batch_size # 172 batches
data_x = []
for i in range(batch_partition_length):
windows_x = []
windows_x.append( raw_x[ j:n + j] )
data_x.append(np.array(windows_x)) # each batch is stacked horizontally.
data_x = np.array(data_x)
data_x = np.reshape(data_x,(-1,n)) # 224 x 13
return data_x,batch_partition_length
data_x,batch_partition_length = gen_batch(dem, batch_size, n_steps)
data_x = np.expand_dims(data_x,axis=-1)
tr = int(0.7*dem_len)
val = int(0.2*dem_len)
x_train, y_train = data_x[:tr,:n_steps], data_x[:tr,-1]
x_valid, y_valid = data_x[tr:tr+val,:n_steps], data_x[tr:tr+val,-1]
x_test, y_test = data_x[tr+val:,:n_steps], data_x[tr+val:,-1]
model = keras.models.Sequential([keras.layers.SimpleRNN(lay1_state_size,input_shape=[None,1]), keras.layers.Dense(dense_state_size)])
# model = keras.models.Sequential([keras.layers.SimpleRNN(lay1_state_size,return_sequences=True,input_shape=[None,1]),keras.layers.SimpleRNN(lay2_state_size),
# keras.layers.Dense(dense_state_size)])
model.compile(optimizer='Adam',loss=keras.losses.mean_absolute_error ,metrics=[tf.keras.metrics.RootMeanSquaredError()] ), y_train, batch_size=batch_size, epochs=num_epochs,validation_data=(x_valid,y_valid))
print('Model Evaluation on test set:\n')
model.evaluate(x_test, y_test,batch_size=batch_size)
y_tru = np.array([])
for step_ahead in range(horizon):
# tru label
y = np.append(data_x[step_ahead+1:,n_steps ], np.array([[0]*(step_ahead+1)]))
y_tru = np.append(y_tru,y)
# prediction
y_pred_one = model.predict(data_x[:,step_ahead:])[:,np.newaxis,:]
data_x = np.concatenate([data_x,y_pred_one ],axis=1)
y_tru = np.reshape(y_tru,(batch_partition_length*batch_size,horizon),order='F')
y_pred_horizon = data_x[:,n_steps+1:]
y_pred_horizon = np.squeeze(y_pred_horizon)
print(' RNN prediction on all data MSE',np.mean(keras.losses.mean_squared_error(y_tru,y_pred_horizon )) )
print(' RNN prediction on all data MAE',np.mean(keras.losses.mean_absolute_error(y_tru,y_pred_horizon )) )
for i in range(10):
The data generation code is given below
from copyreg import pickle
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow.keras as keras
dem_len = 1240
def categorical(p):
return (p.cumsum(-1) >= np.random.uniform(size=p.shape[:-1])[..., None]).argmax(-1)
p = np.array([0.6, 0.3, 0.1])
def dem_hr(hr, lo_veh, hi_veh,len):
dem_hrs = np.array([])
for i in range(10000):
#d = np.random.randint(lo_veh,hi_veh)
d = lo_veh + categorical(p)
z = np.array([0]*(hr-1))
dem_hrs = np.append(dem_hrs, d)
dem_hrs = np.append(dem_hrs, z)
dem_hrs = dem_hrs[:len]
return dem_hrs
def gen_data(len):
dzero = np.zeros(len)
# for hr,lo_veh, hi_veh in zip([4, 8],[1, 6],[3,9]):
# d = dem_hr(hr, lo_veh, hi_veh,len)
# dem = dem + d
# dem = np.array(dem,dtype=np.float32)
d4 = dem_hr(4, 1, 3,len)
d8 = dem_hr(8, 6, 9,len)
dall = dzero + d8
dsub = dall - d4
dem = np.where(dsub>=0,d8,d4)
# plt.plot(dem)
# plt.plot(d4)
# plt.plot(d8)
return dem
dem = gen_data(len=dem_len)'const_dem_2_freq_stoch_cat',dem)
I think incresing the number of steps may help to capture the distribution of magnitudes at different periods. Does increasing the layers also help to capture the magnitude distribution?

Tensorflow Neural Network for Regression

I am using tensor flow library to build a pretty simple 2 layer artificial neural network to perform linear regression.
My problem is that the results seem to be far from expected. I've been trying to spot my mistake for hours but no hope. I am new to tensor flow and neural networks so it could be a trivial mistake. Could anyone have an idea what i am doing wrong?
from __future__ import print_function
import tensorflow as tf
import numpy as np
# Python optimisation variables
learning_rate = 0.02
train_input=10* np.random.rand(data_size,data_length);
test_input= np.random.rand(data_size,data_length);
x = tf.placeholder(tf.float32, [data_size, data_length])
y = tf.placeholder(tf.float32, [data_size, 1])
W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([data_size, 1]), name='b1')
y_ = tf.add(tf.matmul(x, W1), b1)
cost = tf.reduce_mean(tf.square(y-y_))
init_op = tf.global_variables_initializer()
correct_prediction = tf.reduce_mean(tf.square(y-y_))
accuracy = tf.cast(correct_prediction, tf.float32)
with tf.Session() as sess:
_, c =[optimiser, cost],
feed_dict={x:train_input , y:train_label})
print(, feed_dict={x: test_input, y: test_label}))
Thanks for your help!
There are a number of changes you have to make in your code.
First of all, you have to perform training for number of epochs and also feed the optimizer training data in batches. Your learning rate was very high. Bias is supposed to be only one input for every dense (fully connected) layer. You can plot the cost (loss) value to see how your network is converging.
In order to feed data in batches, I have made the changes in placeholders also. Check the full modified code:
from __future__ import print_function
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Python optimisation variables
learning_rate = 0.001
data_size=1000 # Had to change these value to fit in my memory
train_input=10* np.random.rand(data_size,data_length);
test_input= np.random.rand(data_size,data_length);
x = tf.placeholder(tf.float32, [None, data_length])
y = tf.placeholder(tf.float32, [None, 1])
W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([1, 1]), name='b1')
y_ = tf.add(tf.matmul(x, W1), b1)
cost = tf.reduce_mean(tf.square(y-y_))
init_op = tf.global_variables_initializer()
EPOCHS = 500
with tf.Session() as sess:
loss_history = []
for epoch_no in range(EPOCHS):
for offset in range(0, data_size, BATCH_SIZE):
batch_x = train_input[offset: offset + BATCH_SIZE]
batch_y = train_label[offset: offset + BATCH_SIZE]
_, c =[optimiser, cost],
feed_dict={x:batch_x , y:batch_y})
plt.plot(range(len(loss_history)), loss_history)
# For running test dataset
results, test_cost =[y_, cost], feed_dict={x: test_input, y: test_label})
print('test cost: {:.3f}'.format(test_cost))
for t1, t2 in zip(results, test_label):
print('Prediction: {:.3f}, actual: {:.3f}'.format(t1[0], t2[0]))

TensorFlow Linear Regression gives 'NaN' result

I am currently running the TensorFlow model with Linear Regression. However, I don't understand why, even when I decrease the learning_rate from 0.01 to 0.001 and increase the training iterations from 1000 to 50000, I still obtain the 'nan' result for the cost function, as well as the two coefficients. Could anyone please help me detect the problem in the following code?
from __future__ import print_function
import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
import random
rng = numpy.random
# Parameters
learning_rate = 0.001
training_epochs = 20000 #number of iterations
display_step = 400
#read csv file
datapath = [directory path]
Ha_Noi = pd.read_csv(datapath+"HaNoi_1month_LW_WeatherTest.csv")
#Add an additional column into the table
sLength = len(Ha_Noi['accept_rate'])
Ha_Noi['accept_rate_timeT'] = pd.Series(Ha_Noi['accept_rate'], index=Ha_Noi.index)
#Shift the entries in the accept_rate column upward
Ha_Noi.accept_rate = Ha_Noi.accept_rate.shift(-1)
Ha_Noi = Ha_Noi.dropna(subset = ["longwait_percent4"])
Ha_Noi = Ha_Noi.dropna(subset=["accept_rate"])
Ha_Noi = Ha_Noi.dropna(subset = ["longwait_percent2"])
df2 = pd.DataFrame(Ha_Noi)
#split the dataset into training and testing sets
train_set, test_set = train_test_split(Ha_Noi, test_size=0.2, random_state = random.randint(20, 200))
Xtrain = train_set['longwait_percent2'].reshape(-1,1)
Ytrain = train_set['accept_rate'].reshape(-1,1)
Xtrain2 = train_set['Weather Weight_Longwait_percent2'].reshape(-1,1)
Xtest2 = test_set['Weather Weight_Longwait_percent2'].reshape(-1,1)
# Xtest = test_set['longwait_percent2'].reshape(-1,1)
# Ytest = test_set['accept_rate'].reshape(-1,1)
# Training Data
train_X = Xtrain
train_Y = Ytrain
n_samples = train_X.shape[0]
#Testing Data
Xtest = np.asarray(test_set['longwait_percent2'])
Ytest = np.asarray(test_set['accept_rate'])
# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)
# Mean squared error
cost = tf.sqrt(tf.reduce_sum(tf.pow(pred-Y, 2))/(n_samples))
# Gradient descent method
# Note, minimize() knows to modify W and b because Variable objects are "trained" (trainable=True by default)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
saver = tf.train.Saver() #save all the initialized data
# Launch the graph
with tf.Session() as sess:
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):, feed_dict={X: x, Y: y})
# Display logs per epoch step
if (epoch+1) % display_step == 0: # checkpoint every 50 epochs
c =, feed_dict={X: train_X, Y:train_Y})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W=",, "b=",
print("Optimization Finished!")
training_cost =, feed_dict={X: train_X, Y: train_Y})
print("Training cost=", training_cost, "W=",, "b=",, '\n')
# Graphic display
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.plot(train_X, * train_X +, label='Fitted line')
testing_cost =
tf.reduce_sum(tf.pow(pred - Y, 2)) / (Xtest.shape[0]),
feed_dict={X: Xtest, Y: Ytest}) # square root of function cost above
print("Root Mean Square Error =", tf.sqrt(testing_cost))
print("Absolute mean square loss difference:", abs(
training_cost - testing_cost))
plt.plot(Xtest, Ytest, 'bo', label='Testing data')
plt.plot(train_X, * train_X +, label='Fitted line')
Don't have your data, so it's hard to tell whether the problem is caused by data or by training problem. You can make learning rate and training iteration much smaller such 0.00005 and 100 to see is there still NaN.

Cannot load int variable from previous session in tensorflow 1.1

I have read many similar questions and just cannot get this to work properly.
I have my model being trained well and checkpoint files are being made every epoch. I want to have it so the program can continue from epoch x once reloaded and also for it to print that is on that epoch with every iteration. I could simply save the data outside of the checkpoint file, however I was also wanting to do this to give me confidence everything else is also being stored properly.
Unfortunately the value in the epoch/global_step variable is always still 0 when I restart.
import tensorflow as tf
import numpy as np
import tensorflow as tf
import numpy as np
# more imports
def extract_number(f): # used to get latest checkpint file
s = re.findall("epoch(\d+).ckpt",f)
return (int(s[0]) if s else -1,f)
def restore(init_op, sess, saver): # called to restore or just initialise model
list = glob(os.path.join("./params/e*"))
if list:
file = max(list,key=extract_number)
saver.restore(sess, file[:-5])
with tf.Graph().as_default() as g:
# build models
total_batch = data.train.num_examples / batch_size
epochLimit = 51
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
restore(init_op, sess, saver)
epoch = global_step.eval()
while epoch < epochLimit:
total_batch = data.train.num_examples / batch_size
for i in range(int(total_batch)):
voxels = newData.eval()
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32), feed_dict={z:batch_z, train:True}), feed_dict={input:voxels, z:batch_z, train:True})
with open("out/loss.csv", 'a') as f:
batch_loss_G =, feed_dict={z:batch_z, train:False})
batch_loss_D =, feed_dict={input:voxels, z:batch_z, train:False})
msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D)
epoch=epoch+1, "params/epoch{0}.ckpt".format(epoch))
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
voxels =, feed_dict={z:batch_z})
v = voxels[0].reshape([32, 32, 32]) > 0
util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)
I also update the global step variable using assign at the bottom. Any ideas? Any help would be greatly appreciated.
When you call after restoring this resets all variables to their initial values. Comment that line out and things should work.
My original code was wrong for several reasons because I was trying so many things. The first responder Alexandre Passos gives a valid point, but I believe what changed the game was also the use of scopes (maybe?).
Below is the working updated code if it helps anyone:
import tensorflow as tf
import numpy as np
# more imports
def extract_number(f): # used to get latest checkpint file
s = re.findall("epoch(\d+).ckpt",f)
return (int(s[0]) if s else -1,f)
def restore(sess, saver): # called to restore or just initialise model
list = glob(os.path.join("./params/e*"))
if list:
file = max(list,key=extract_number)
saver.restore(sess, file[:-5])
return saver, True, sess
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
return saver, False , sess
batch_size = 100
learning_rate = 0.0001
beta1 = 0.5
z_size = 100
save_interval = 1
data =
total_batch = data.train.num_examples / batch_size
def fill_queue():
for i in range(int(total_batch*epochLimit)):, feed_dict={batch: data.train.next_batch(batch_size)}) # runnig in seperate thread to feed a FIFOqueue
with tf.variable_scope("glob"):
global_step = tf.get_variable(name='global_step', initializer=0,trainable=False)
# build models
epochLimit = 51
saver = tf.train.Saver()
with tf.Session() as sess:
saver,rstr,sess = restore(sess, saver)
with tf.variable_scope("glob", reuse=True):
epocht = tf.get_variable(name='global_step', trainable=False, dtype=tf.int32)
epoch = epocht.eval()
while epoch < epochLimit:
total_batch = data.train.num_examples / batch_size
for i in range(int(total_batch)):
voxels = newData.eval()
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32), feed_dict={z:batch_z, train:True}), feed_dict={input:voxels, z:batch_z, train:True})
with open("out/loss.csv", 'a') as f:
batch_loss_G =, feed_dict={z:batch_z, train:False})
batch_loss_D =, feed_dict={input:voxels, z:batch_z, train:False})
msgOut = "Epoch: [{0}], i: [{1}], G_Loss[{2:.8f}], D_Loss[{3:.8f}]".format(epoch, i, batch_loss_G, batch_loss_D)
epoch=epoch+1, "params/epoch{0}.ckpt".format(epoch))
batch_z = np.random.uniform(-1, 1, [batch_size, z_size]).astype(np.float32)
voxels =, feed_dict={z:batch_z})
v = voxels[0].reshape([32, 32, 32]) > 0
util.save_binvox(v, "out/epoch{0}.vox".format(epoch), 32)

Tensorflow does not train CIFAR - 100 data

I am trying to build a linear classifier with CIFAR - 100 using TensorFlow. I got the code from Martin Gorner's MNIST tutorial and change a bit. When I run this code, tensorflow does not training (code is running but accuracy remains 1.0 and loss(cross entropy remains as 4605.17), I don't know what is wrong, I am actually newbie to TF any help is appreciated.
import pickle
import numpy as np
import os
import tensorflow as tf
from tensorflow.python.framework import tensor_util
import math
#imports data
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
cifar100_test = {}
cifar100_train = {}
labelMap = {}
labelNames = {}
# Load the raw CIFAR-10 data.
cifar100_test = unpickle('dataset/cifar-100-python/test')
cifar100_train = unpickle('dataset/cifar-100-python/train')
labelMap = unpickle('dataset/cifar-100-python/meta')
#tr for training data and te for testing data, X is data, Y is label
Xtr = cifar100_train[b'data']
Yr = cifar100_train[b'fine_labels']
Xte = cifar100_test[b'data']
Ye = cifar100_test[b'fine_labels']
classNames = labelMap[b'fine_label_names']
num_train = Xtr.shape[0]
num_test = Xte.shape[0]
num_class = len(classNames)
Ytr = np.zeros([num_train, num_class])
Yte = np.zeros([num_test, num_class])
Ytr[0:num_train, Yr[0:num_train]] = 1
Yte[0:num_test, Ye[0:num_test]] = 1
# As a sanity check, we print out the size of the training and test data.
print('Train data shape:', Xtr.shape)
print('Train Label shape:', Ytr.shape)
print('Test data shape:', Xte.shape)
print('Test Label shape:', Yte.shape)
print('Name of Predicted Class:', classNames[0]) #indice of the label name is the indice of the class.
Xtrain = Xtr#[:1000]
Xtest = Xte#[:100]
Ytrain = Ytr#[:1000]
Ytest = Yte#[:100]
print('Train data shape:', Xtrain.shape)
print('Train Label shape:', Ytrain.shape)
print('Test data shape:', Xtest.shape)
print('Test Label shape:', Ytest.shape)
Xtrain = np.reshape(Xtrain,(50000, 32, 32, 3)).transpose(0,1,2,3).astype(float)
Xtest = np.reshape(Xtest,(10000, 32, 32, 3)).transpose(0,1,2,3).astype(float)
Xbatches = np.split(Xtrain, 500); #second number is # of batches
Ybatches = np.split(np.asarray(Ytrain), 500);
XtestB = np.split(Xtest, 100);
YtestB = np.split(Ytest, 100);
print('X # of batches:', len(Xbatches))
print('Y # of batches:', len(Ybatches))
# input X: 28x28 grayscale images, the first dimension (None) will index the images in the mini-batch
X = tf.placeholder(tf.float32, [100, 32, 32, 3])
# correct answers will go here
Y_ = tf.placeholder(tf.float32, [100, 100])
# weights W[784, 10] 784=28*28
W = tf.Variable(tf.zeros([3072, 100]))
# biases b[10]
b = tf.Variable(tf.zeros([100]))
# flatten the images into a single line of pixels
# -1 in the shape definition means "the only possible dimension that will preserve the number of elements"
XX = tf.reshape(X, [-1, 3072])
# The model
Y = tf.nn.softmax(tf.matmul(XX, W) + b)
# loss function: cross-entropy = - sum( Y_i * log(Yi) )
# Y: the computed output vector
# Y_: the desired output vector
# cross-entropy
# log takes the log of each element, * multiplies the tensors element by element
# reduce_mean will add all the components in the tensor
# so here we end up with the total cross-entropy for all images in the batch
cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 1000.0 # normalized for batches of 100 images,
# *10 because "mean" included an unwanted division by 10
# accuracy of the trained model, between 0 (worst) and 1 (best)
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# training, learning rate = 0.005
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
# init
init = tf.global_variables_initializer()
sess = tf.Session()
for i in range(500):
# the backpropagation training step
t, Loss =[train_step, cross_entropy], feed_dict={X: Xbatches[i], Y_: Ybatches[i]})
for i in range(100):
print('accuracy:',, feed_dict={X: XtestB[i], Y_: YtestB[i]}))
You compute the accuracy a hundred times after the training process is completed. Nothing will change there. You should place your print('accuracy:'....) within the for loop in which you perform the backpropagation:
for i in range(500):
# the backpropagation training step
t, Loss =[train_step, cross_entropy], feed_dict={X: Xbatches[i], Y_: Ybatches[i]})
print('accuracy:',, feed_dict={X: XtestB[i], Y_: YtestB[i]}))
Sorry for the post it turns out that it is a basic mistake.
I changed following;
Ytr[0:num_train, Yr[0:num_train]] = 1
Yte[0:num_test, Ye[0:num_test]] = 1
Ytr[range(num_train), Yr_temp[range(num_train)]] = 1
Yte[range(num_test), Ye_temp[range(num_test)]] = 1
First one make all values 1, but I just wanted to make indice of the true class 1 and other elements 0. Thanks for your time.