I am training GANs model. For loading the dataset, I am using Dataset API of TensorFlow.
# train_dataset has image and label. z_train dataset has noise (z).
train_dataset =
z_train =[total_training_samples, seq_length, z_dim],
minval=0, maxval=1, dtype=tf.float32))
train_dataset =, z_train))
Creating Iterator:
iter =, train_dataset.output_shapes)
Using the iterator:
(img, label), z = iter.get_next()
train_init_op = iter.make_initializer(train_dataset)
While training the GAN in session:
Training Discriminator first:
_, disc_loss =[disc_optim, disc_loss])
then training Generator:
_, gen_loss =[gen_optim, gen_loss])
Here is the catch. Since, I am using label as condition (CGAN) in both, discriminator and generator graph, using two produces two different set of batch of label during the same run of batch.
for epoch in range(num_of_epochs):[tf.global_variables_initializer(), train_init_op.initializer])
for batch in range(num_of_batches):
_, disc_loss =[disc_optim, disc_loss])
_, gen_loss =[gen_optim, gen_loss])
Since, I have to feed the same batch of label in the generator's session run as in discriminator's session run, how shall I prevent Dataset API to produce two different batches in the same loop of a batch?
Note: I am using TensorFlow v1.9
Thanks in advance.

You can create 2 iterators for the same dataset. If you need to shuffle the dataset, you can even do that by specifying the seed as a tensor. See example below.
import tensorflow as tf
seed_ts = tf.placeholder(tf.int64)
ds =[1,2,3,4,5]).shuffle(5, seed=seed_ts, reshuffle_each_iteration=True)
it1 = ds.make_initializable_iterator()
it2 = ds.make_initializable_iterator()
input1 = it1.get_next()
input2 = it2.get_next()
with tf.Session() as sess:
for ep in range(10):, feed_dict={seed_ts: ep}), feed_dict={seed_ts: ep})
print("Epoch" + str(ep))
for i in range(5):
x =
y =
print([x, y])


In tensorflow 1, when the loss function is defined with operations on Tensors, is the model really trained?

First, I m sorry but it's not possible to reproduce this problem on a few lines, as the model involved is a very complex network.
But here is an idea of the code:
def return_iterator(data, nb_epochs, batch_size):
dataset =
dataset = dataset.repeat(nb_epochs).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
yy = iterator.get_next()
return tf.cast(yy, tf.float32)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
y_pred = complex_model.autoencode(train)
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
nb_epochs = 10
batch_size = 64
y_real = return_iterator(train, nb_epochs, batch_size)
y_pred = return_iterator(y_pred, nb_epochs, batch_size)
res_equal = 1. - tf.reduce_mean(tf.abs(y_pred - y_real), [1,2,3])
loss = 1 - tf.reduce_sum(res_equal, axis=0)
opt = tf.train.AdamOptimizer().minimize(loss)
for epoch in range(0, nb_epochs):
_, d_loss =[opt, loss])
To define the loss, I must use operations like tf.reduce_mean and tf.reduce_sum , and these operations only accept Tensors as input.
My question is: with this code, will the complex_model autoencoder be trained during the training ? (eventhough here, it's just used to output the predictions to compute the loss)
Thank you
p.s: I am using TF1.15 (and I cannot use another version)

How to load MNIST via TensorFlow (including download)?

The TensorFlow documentation for MNIST recommends multiple different ways to load the MNIST dataset:
All ways described in the documentation throw many deprecated warnings with TensorFlow 1.8.
The way I'm currently loading MNIST and creating batches for training:
class MNIST:
def __init__(self, optimizer):
self.mnist_dataset = input_data.read_data_sets("/tmp/data/", one_hot=True)
self.test_data = self.mnist_dataset.test.images.reshape((-1, self.timesteps, self.num_input))
self.test_label = self.mnist_dataset.test.labels
def train_run(self, sess):
batch_input, batch_output = self.mnist_dataset.train.next_batch(self.batch_size, shuffle=True)
batch_input = batch_input.reshape((self.batch_size, self.timesteps, self.num_input))
_, loss =[self.train_step, self.loss], feed_dict={self.input_placeholder: batch_input, self.output_placeholder: batch_output})
def test_run(self, sess):
loss =[self.loss], feed_dict={self.input_placeholder: self.test_data, self.output_placeholder: self.test_label})
How could I do exactly the same thing, just with the current method of doing this?
I couldn't find any documentation on this.
It seems to me that the new way is something in the lines of:
train, test = tf.keras.datasets.mnist.load_data()
self.mnist_train_ds =
self.mnist_test_ds =
But how can I use these datasets in my train_run and test_run method?
An example of loading the MNIST dataset using TF dataset API:
Create a mnist dataset to load train, valid and test images:
You can create a dataset for numpy inputs, either using Dataset.from_tensor_slices or Dataset.from_generator. Dataset.from_tensor_slices adds the whole dataset to the computational graph, so we will use Dataset.from_generator instead.
#load mnist data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
def create_mnist_dataset(data, labels, batch_size):
def gen():
for image, label in zip(data, labels):
yield image, label
ds =, (tf.float32, tf.int32), ((28,28 ), ()))
return ds.repeat().batch(batch_size)
#train and validation dataset with different batch size
train_dataset = create_mnist_dataset(x_train, y_train, 10)
valid_dataset = create_mnist_dataset(x_test, y_test, 20)
A feedable iterator that can toggle between training and validation
handle = tf.placeholder(tf.string, shape=[])
iterator =
handle, train_dataset.output_types, train_dataset.output_shapes)
image, label = iterator.get_next()
train_iterator = train_dataset.make_one_shot_iterator()
valid_iterator = valid_dataset.make_one_shot_iterator()
A sample run:
#A toy network
y = tf.layers.dense(tf.layers.flatten(image),1,activation=tf.nn.relu)
loss = tf.losses.mean_squared_error(tf.squeeze(y), label)
with tf.Session() as sess:
# The `Iterator.string_handle()` method returns a tensor that can be evaluated
# and used to feed the `handle` placeholder.
train_handle =
valid_handle =
# Run training
train_loss, train_img, train_label =[loss, image, label],
feed_dict={handle: train_handle})
# train_image.shape = (10, 784)
# Run validation
valid_pred, valid_img =[y, image],
feed_dict={handle: valid_handle})
#test_image.shape = (20, 784)

Model evaluation from a checkpoint with Multi GPU

I know how to train a network on a single GPU -> save a checkpoint -> later on load this checkpoint -> run benchmarks.
I can't figure how to do it when I train using multiple GPUs and using the new Data API.
Here is the 'normal' training code:
import tensorflow as tf
images_placeholder = tf.placeholder(tf.float32, shape=(None, image_size,
image_size, 1), name='input')
labels_placeholder = tf.placeholder(tf.int32, shape=(None))
embeddings = build_graph(images_placeholder)
loss = add_loss(embeddings, labels_placeholder)
embeddings = tf.identity(embeddings, 'embeddings')
Later on, when I want to benchmark:
with tf.Graph().as_default():
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
load_graph_def(model_path) # for example: d:\model.ckpt-0
images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
images = benchmark_utils.load_data(paths_batch, image_size)
feed_dict = {images_placeholder: images}
predictions =, feed_dict=feed_dict)
So now I want to train with multiple GPUs like so:
with tf.Graph().as_default(), tf.device('/cpu:0'):
dataset =, labels_list))
dataset =
dataset = dataset.shuffle(buffer_size=100)
dataset = dataset.batch(128)
dataset = dataset.repeat()
opt = tf.train.MomentumOptimizer(0.01, momentum=0.9, use_nesterov=True)
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in range(num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
image_batch, label_batch = dataset.iterator.get_next()
loss = tower_loss(scope, image_batch, label_batch)
What I can't figure out is how can I get the 'input' and 'embeddings' tensor when I want to benchmark the checkpoint.
How do I define for example the tensor called 'input' that should receive the images that should be evaluated ?
I'm guessing that somewhere in the multi-gpu code, I should define this images_placeholder like I defined in the single-gpu training.
Thanks for any advice!

Tensorflow shuffle_batch speed

I noticed a big difference in speed if I load my training data into memory and feed it into the graph as a numpy array vs using a shuffle batch of the same size, my data has ~1000 instances.
Using memory 1000 iterations takes less than a few seconds but using a shuffle batch it takes almost 10 minutes. I get the shuffle batch should be a bit slower but this seems way too slow. Why is this?
Added a bounty. Any suggestions on how to make shuffled mini-batches faster?
Here is the training data: Link to bounty_training.csv (pastebin)
Here is my code:
import numpy as np
import tensorflow as tf
data = np.loadtxt('bounty_training.csv',
delimiter=',',skiprows=1,usecols = (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14))
filename = "test.tfrecords"
with tf.python_io.TFRecordWriter(filename) as writer:
for row in data:
features, label = row[:-1], row[-1]
example = tf.train.Example()
def read_and_decode_single_example(filename):
filename_queue = tf.train.string_input_producer([filename],
reader = tf.TFRecordReader()
_, serialized_example =
features = tf.parse_single_example(
'label': tf.FixedLenFeature([], np.float32),
'features': tf.FixedLenFeature([14], np.float32)})
pdiff = features['label']
avgs = features['features']
return avgs, pdiff
avgs, pdiff = read_and_decode_single_example(filename)
n_features = 14
batch_size = 1000
hidden_units = 7
lr = .001
avgs_batch, pdiff_batch = tf.train.shuffle_batch(
[avgs, pdiff], batch_size=batch_size,
X = tf.placeholder(tf.float32,[None,n_features])
Y = tf.placeholder(tf.float32,[None,1])
W = tf.Variable(tf.truncated_normal([n_features,hidden_units]))
b = tf.Variable(tf.zeros([hidden_units]))
Wout = tf.Variable(tf.truncated_normal([hidden_units,1]))
bout = tf.Variable(tf.zeros([1]))
hidden1 = tf.matmul(X,W) + b
pred = tf.matmul(hidden1,Wout) + bout
loss = tf.reduce_mean(tf.squared_difference(pred,Y))
optimizer = tf.train.AdamOptimizer(lr).minimize(loss)
with tf.Session() as sess:
init = tf.global_variables_initializer()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for step in range(1000):
x_, y_ =[avgs_batch,pdiff_batch])
_, loss_val =[optimizer,loss],
feed_dict={X: x_, Y: y_.reshape(batch_size,1)} )
if step % 100 == 0:
Full batch via numpy array
avgs and pdiff loaded into numpy arrays first...
Same model as above
with tf.Session() as sess:
init = tf.global_variables_initializer()
for step in range(1000):
_, loss_value =[optimizer,loss],
feed_dict={X: avgs,Y: pdiff.reshape(n_instances,1)} )
In this case, you're running a session 3 times per step - once in avgs_batch.eval, once for pdiff_batch.eval, and once for the actual call. That doesn't explain the magnitude of the slow down, but it's definitely something you should keep in mind. At the very least the first two eval calls should be combined to one call.
I suspect most of the slow-down is coming from use of TFRecordReader. I don't pretend to understand the inner workings of tensorflow, but you might find my answer here helpful.
create minimal data associated with each example, i.e. image filenames, ids rather than entire images;
convert to tensorflow ops with tensorflow.python.framework.ops.convert_to_tensor;
use tf.train.slice_input_producer to get a tensor for a single example;
do some preprocessing on individual examples - e.g. load images from filenames;
batch them together using tf.train.batch to group them up.
The trick is instead of feeding single examples into shuffle_batch you feed an n+1 dimensional tensor of examples to it with enqueue_many=True. I found this thread that was very helpful:
TFRecordReader seems extremely slow , and multi-threads reading not working
def get_batch(batch_size):
reader = tf.TFRecordReader()
_, serialized_example =
batch_list = []
for i in range(batch_size):
return [batch_list]
batch_serialized_example = tf.train.shuffle_batch(
get_batch(batch_size), batch_size=batch_size,
features = tf.parse_example(
'label': tf.FixedLenFeature([], np.float32),
'features': tf.FixedLenFeature([14], np.float32)})
batch_pdiff = features['label']
batch_avgs = features['features']
When using queues to get the data, you shouldn't use feed_dict. Instead, make your graph depend directly on the input data, that is:
remove the X and Y PlaceHolders
use your feature batch directly
hidden1 = tf.matmul(avgs_batch,W) + b
similarly, use the label batch (pdiff_batch) instead of Y when computing the loss
finally, just keep the second to compute the loss directly, and without using feed_dict
# x_, y_ =[avgs_batch,pdiff_batch])
# _, loss_val =[optimizer,loss],
feed_dict={X: x_, Y: y_.reshape(batch_size,1)} )
_, loss_val =[optimizer,loss])

Add a summary of accuracy of the whole train/test dataset in Tensorflow

I am trying to use Tensorboard to visualize my training procedure. My purpose is, when every epoch completed, I would like to test the network's accuracy using the whole validation dataset, and store this accuracy result into a summary file, so that I can visualize it in Tensorboard.
I know Tensorflow has summary_op to do it, however it seems only work for one batch when running the code I need to calculate the accuracy for the whole dataset. How?
Is there any example to do it?
Define a tf.scalar_summary that accepts a placeholder:
accuracy_value_ = tf.placeholder(tf.float32, shape=())
accuracy_summary = tf.scalar_summary('accuracy', accuracy_value_)
Then calculate the accuracy for the whole dataset (define a routine that calculates the accuracy for every batch in the dataset and extract the mean value) and save it into a python variable, let's call it va.
Once you have the value of va, just run the accuracy_summary op, feeding the accuracy_value_ placeholder:, feed_dict={accuracy_value_: va})
I implement a naive one-layer model as an example to classify MNIST dataset and visualize validation accuracy in Tensorboard, it works for me.
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
import os
# number of epoch
num_epoch = 1000
model_dir = '/tmp/tf/onelayer_model/accu_info'
# mnist dataset location, change if you need
data_dir = '../data/mnist'
# load MNIST dataset without one hot
dataset = read_data_sets(data_dir, one_hot=False)
# Create placeholder for input images X and labels y
X = tf.placeholder(tf.float32, [None, 784])
# one_hot = False
y = tf.placeholder(tf.int32)
# One layer model graph
W = tf.Variable(tf.truncated_normal([784, 10], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[10]))
logits = tf.nn.relu(tf.matmul(X, W) + b)
init = tf.initialize_all_variables()
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y)
# loss function
loss = tf.reduce_mean(cross_entropy)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
_, top_1_op = tf.nn.top_k(logits)
top_1 = tf.reshape(top_1_op, shape=[-1])
correct_classification = tf.cast(tf.equal(top_1, y), tf.float32)
# accuracy function
acc = tf.reduce_mean(correct_classification)
# define info that is used in SummaryWritter
acc_summary = tf.scalar_summary('valid_accuracy', acc)
valid_summary_op = tf.merge_summary([acc_summary])
with tf.Session() as sess:
# initialize all the variable
print("Writing Summaries to %s" % model_dir)
train_summary_writer = tf.train.SummaryWriter(model_dir, sess.graph)
# load validation dataset
valid_x = dataset.validation.images
valid_y = dataset.validation.labels
for epoch in xrange(num_epoch):
batch_x, batch_y = dataset.train.next_batch(100)
feed_dict = {X: batch_x, y: batch_y}
_, acc_value, loss_value =
[train_op, acc, loss], feed_dict=feed_dict)
vsummary =,
feed_dict={X: valid_x,
y: valid_y})
# Write validation accuracy summary
train_summary_writer.add_summary(vsummary, epoch)
Using batching with your validation set is possible in case you are using tf.metrics ops, which use internal counters. Here is a simplified example:
model = create_model()
tf.summary.scalar('cost', model.cost_op)
acc_value_op, acc_update_op = tf.metrics.accuracy(labels,predictions)
summary_common = tf.summary.merge_all()
summary_valid = tf.summary.merge([
tf.summary.scalar('accuracy', acc_value_op),
# other metrics here...
with tf.Session() as sess:
train_writer = tf.summary.FileWriter(logs_path + '/train',
valid_writer = tf.summary.FileWriter(logs_path + '/valid')
While training, only write the common summary using your train-writer:
summary =
train_writer.add_summary(summary, tf.train.global_step(sess, gstep_op))
After every validation, write both summaries using the valid-writer:
gstep, summaryc, summaryv =[gstep_op, summary_common, summary_valid])
valid_writer.add_summary(summaryc, gstep)
valid_writer.add_summary(summaryv, gstep)
When using tf.metrics, don't forget to reset the internal counters (local variables) before every validation step.