tensorflow Dataset order undefined? - tensorflow

If I use multiple elements from a tf.data.Dataset dataset to build the graph, and then evaluate the graph later, it seems the order the element from the Dataset is undefined. As an example, the following code snippet
import tensorflow as tf
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()
print 'build graph and then eval'
keep = []
for i in range(5):
keep.append(iterator.get_next())
with tf.Session() as sess:
keep_eval = sess.run(keep)
print keep_eval
print 'eval each element'
with tf.Session() as sess:
for i in range(5):
print sess.run(iterator.get_next()),
will result in output like:
build graph and then eval
[3 0 1 4 2]
eval each element
0 1 2 3 4
Also, each run will yield different "build graph and then eval".
I would expect "build graph and then eval" to be ordered as well like "eval each element". Can anyone explain why this happens?

The order of a tf.data.Dataset is defined and deterministic (unless you add a non-deterministic Dataset.shuffle()).
However, your two loops build different graphs, which accounts for the difference:
The "build graph and then eval" part creates a list of five iterator.get_next() operations and runs the five operations in parallel. Because these operations run in parallel, they may produce results in different order.
The "eval each element" part also creates five iterator.get_next() operations, but it runs them sequentially, so you always get the results in the expected order.
Note that we do not recommend calling iterator.get_next() in a loop, because it creates a new operation on each call, which gets added to the graph, and consumes memory. Instead, when you loop over a Dataset, try to use the following pattern:
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()
# Call `iterator.get_next()` once and use the result in each iteration.
next_element = iterator.get_next()
with tf.Session() as sess:
for i in range(5):
print sess.run(next_element)

From the TensorFlow FAQs here
The individual ops have parallel implementations, using multiple cores in a CPU, or multiple threads in a GPU.
So your "build graph then eval" call runs in parallel for each element in the list, which is why the numbers are in random order, while the for loop forces one call to be run after another, so its serial. You can verify by timing both, the first one should be fast, the for loop will be slower.

Related

How to get the same data batch multiple times using TensorFlow's `tf.data` API

Is there a way to evaluate a tensor that depends on an tf.data iterator but temporarily pause the iterator so that it returns the previous batch?
Imagine snippet below:
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()
next_batch = iterator.get_next()
train_op = next_batch * 10
Every time I evaluate train_op it does so by fetching a new batch of data – which is what I want. However every N steps I'd like to do some additional stuff for debugging like evaluating accuracy on the training batch, creating a checkpoint, running things with dropout disabled etc. I'd like these operations to happen on the same data batch I have just used but I haven't found a way to pause tf.data iterator for one or multiple steps.
The obvious solution is to use placeholders instead of directly using next_batch. This means I have to evaluate next_batch first, and then feed it back to the session using feed_dict to evaluate train_op. I believe this is not recommended due to performance penalty. Is that still the case? If so what is the recommended way to deal with these cases?
Edit: adding pseudo code for what I'm after:
for step in num_steps:
sess.run(train_op) # train_op depends on next_batch and therefore fetches new batch
if step % N == 0:
# I want below to run on the same batch above but acc_op also
# depends on next_batch and therefore fetches a new batch
acc = sess.run([acc_op, saver_op, feed_dic={keep_drop:1}])
Does not it work in following ways,
dataset = tf.data.Dataset.range(5)
iterator = dataset.make_one_shot_iterator()
next_batch = iterator.get_next()
train_op = next_batch * 10
other_ops = do_other_stuff(next_batch)
num_train_batch = 50
for ep in range(num_train_batch):
if ep%N==0:
_, other_stuffs = sess.run([train_op, other_ops])
else:
_ = ses.run(train_op)
and, you can feed the dropout differently each time

Creating an image summary only for a subset of validation set images using Tensorflow Estimator API

I'm trying to add image summary operations to visualize how well my network manages to reconstruct inputs from the validation set. However, since there are too many images in the validation set I would only like to plot a small subset of them.
I managed to achieve this with manual training loop, but I struggle to achieve the same with the new Tensorflow Estimator/Experiment/Datasets API. Has anyone done something like this?
The Experiment and Estimator are high level TensorFlow APIs. Although you could probably solve your issue with a hook, if you want more control on what's happening during the training process, it may be easier not to use these APIs.
That said, you can still use the Dataset API which will bring you a lot of useful features.
To solve your problem with the Dataset API, you will need to switch between train and validation datasets in your training loop.
One way to do that is to use a feedable iterator. See here for more details:
https://www.tensorflow.org/programmers_guide/datasets
You can also see a full example switching between training and validation with the Dataset API in this notebook.
In brief, after having created your train_dataset and your val_dataset, your training loop could be something like this:
# create TensorFlow Iterator objects
training_iterator = val_dataset.make_initializable_iterator()
val_iterator = val_dataset.make_initializable_iterator()
with tf.Session() as sess:
# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)
# Create training data and validation data handles
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(val_iterator.string_handle())
for epoch in range(number_of_epochs):
# Tell iterator to go to beginning of dataset
sess.run(training_iterator.initializer)
print ("Starting epoch: ", epoch)
# iterate over the training dataset and train
while True:
try:
sess.run(train_op, feed_dict={handle: training_handle})
except tf.errors.OutOfRangeError:
# End of epoch
break
# Tell validation iterator to go to beginning of dataset
sess.run(val_iterator.initializer)
# run validation on only 10 examples
for i in range(10):
my_value = sess.run(my_validation_op, feed_dict={handle: validation_handle}))
# Do whatever you want with my_value
...
I figured out a solution that uses Estimator/Experiment API.
First you need to modify your Dataset input to not only provide labels and features, but also some form of an identifier for each sample (in my case it was a filename). Then in the hyperparameters dictionary (params argument) you need to specify which of the validation samples you want to plot. You also will have to pass the model_dir in those parameters. For example:
params = tf.contrib.training.HParams(
model_dir=model_dir,
images_to_plot=["100307_EMOTION.nii.gz", "100307_FACE-SHAPE.nii.gz",
"100307_GAMBLING.nii.gz", "100307_RELATIONAL.nii.gz",
"100307_SOCIAL.nii.gz"]
)
learn_runner.run(
experiment_fn=experiment_fn,
run_config=run_config,
schedule="train_and_evaluate",
hparams=params
)
Having this set up you can create conditional Summary operations in your model_fn and an evaluation hook to include them in your outputs.
if mode == tf.contrib.learn.ModeKeys.EVAL:
summaries = []
for image_to_plot in params.images_to_plot:
is_to_plot = tf.equal(tf.squeeze(filenames), image_to_plot)
summary = tf.cond(is_to_plot,
lambda: tf.summary.image('predicted', predictions),
lambda: tf.summary.histogram("ignore_me", [0]),
name="%s_predicted" % image_to_plot)
summaries.append(summary)
evaluation_hooks = [tf.train.SummarySaverHook(
save_steps=1,
output_dir=os.path.join(params.model_dir, "eval"),
summary_op=tf.summary.merge(summaries))]
else:
evaluation_hooks = None
Note that the summaries have to be conditional - we are either plotting an image (computationally expensive) or saving a constant (computationally cheap). I opted for using histogram versus scalar in for the dummy summaries to avoid cluttering my tensorboard dashboard.
Finally you need to pass the hook in the return object of your `model_fn'
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
evaluation_hooks=evaluation_hooks
)
Please note that this only works when your batch size is 1 when evaluating the model (which should not be a problem).

How does one move data to multiple GPU towers using Tensorflow's Dataset API

We are running multi GPU jobs on Tensorflow and evaluating a migration from the queue based model (using the string_input_producer interface) to the new Tensorflow Dataset API. The latter appears to offer an easier way to switch between Train and Validation, concurrently.
A snippet of code below shows how we are doing this.
train_dataset, train_iterator = get_dataset(train_files, batch_size, epochs)
val_dataset, val_iterator = get_dataset(val_files, batch_size, epochs)
is_validating = tf.placeholder(dtype=bool, shape=())
next_batch = tf.cond(is_validating,
lambda: val_iterator.get_next(),
lambda: train_iterator.get_next())
validation_tower = self.num_gpus - 1
tower_grads = []
for i in range(self.num_gpus):
with tf.variable_scope(tf.get_variable_scope(),reuse=(i > 0)):
with tf.device('/gpu:%d' % i), tf.name_scope('%s_%d' % ('gpu_', i)) as scope:
if i == validation_tower:
images, labels = next_batch
# Loss funcs snipped out
else:
images, labels = next_batch
# Loss funcs snipped out
The get_dataset function builds a dataset, sets a map function and a batch size. It also builds an iterator, but doesn't initialize it. Initialization of the iterator occurs before the session starts.
The is_validating boolean is supplied while the session is running, and every few steps we pass is_validating as True via a feed_dict to use the validation dataset
The question I have is:
Lets say I have 8 gpus, so we run training on 7 GPUs. Does the Iterator advance from the same point for each of these 7 GPUs, hence supplying all 7 GPU's with the same data?
At present there are three main options, which have different usability and performance trade-offs:
In the Dataset.batch() transform, create a single large batch containing examples for all of your GPUs. Then use tf.split(..., self.num_gpus) on the output of Iterator.get_next() to create sub-batches for each GPU. This is probably the easiest approach, but it does place the splitting on the critical path.
In the Dataset.batch() transform, create a mini-batch that is sized for a single GPU. Then call Iterator.get_next() once per GPU to get multiple different batches. (By contrast, in your current code, the same value of next_batch is sent to each GPU, which is probably not what you wanted to happen.)
Create multiple iterators, one per GPU. Shard the data using Dataset.shard() early in the pipeline (e.g. on the list of files if your dataset is sharded). Note that this approach will consume more resources on the host, so you may need to dial down any buffer sizes and/or degrees of parallelism
Note that the current tf.data pipelines run on the CPU only, and an important aspect of an efficient pipeline is staging your training input to the GPU while the previous step is still running. See the TensorFlow CNN benchmarks for example code that shows how to stage data to GPUs efficiently. We are currently working on adding this support to the tf.data API directly.

Outputting batch/epoch training loss during `tf.train.MonitoredTrainingSession`

I would like to output my loss with MonitoredTrainingSession every epoch or batch.
Ideally I would love to get a flag that the epoch is ended or be able to provide a callback like in keras. I see that I can also do it by manually counting steps, but I want to use the tf functionality, which seems still poorly documented.
From what I could find in their documentation, one can use tf.train.LoggingTensorHook to print the tensors every n steps.
The problem however is that it prints with frequency different from what I request. When I run following with every_n_iter=4 I get output every 2nd iteration:
tf.reset_default_graph()
with g.as_default():
loghook = tf.train.LoggingTensorHook([tf.reduce_mean(loss, name='m_loss')],
every_n_iter=4,
formatter=lambda x: "LOSS\t%.4f" % [tt for kk,tt in x.items() if kk.name.startswith('m_loss')][-1]
)
optimizer = get_optimizer(lr=lr, opt_name = opt_name)
training_op = optimizer.minimize(loss)
init_op = tf.global_variables_initializer()
with tf.Session(graph=g) as sess:
sess.run(init_op)
with tf.train.MonitoredTrainingSession(log_step_count_steps=1, hooks=[loghook]) as sess:
losslist = []
while not sess.should_stop():
print('.')
loss_ = sess.run(loss, feed_dict={K.learning_phase():1})
sess.run(training_op)
losslist.append(np.mean(loss_))
I am getting output like:
.
INFO:tensorflow:LOSS 2.2416
.
.
INFO:tensorflow:LOSS 2.1547
.
.
INFO:tensorflow:LOSS 2.1186
.
.
etc. That is it outputs every 2nd step, not every 4th.
The documentation says:
every_n_iter: `int`, print the values of `tensors` once every N local
steps taken on the current worker.
I am running it on one local machine. Why one "local step" equals two loop python iterations? Why two and not five?
Looking at the Python source does not seem helping. Any Google folks aware of what it is doing?
"local step" is incremented on every call to sess.run(). You are calling sess.run() twice within your while loop.
Here are some pointers to relevant code:
https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/python/training/basic_session_run_hooks.py#L255 - increment _iter_count after every call to sess.run().
https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/python/training/basic_session_run_hooks.py#L228 - If _iter_count should trigger logging, add the current tensors to be run in the following call to sess.run() so that their values can be logged next.

Run train op multiple times in tensorflow

I have some fairly large batch sizes on which I'd like to take multiple gradient steps. While I could easily do this with a python for loop, I imagine that there might be a more efficient method that doesn't involve transferring the data to gpu on each iteration. I've tried putting the train op in the fetch list multiple times, but I'm not sure that it's actually being run more than once (the runtime is exactly the same).
If you have variable-sized batch then variable is a bad fit for saving it, and you could instead persist this data between run calls using peristent tensors. Here's a toy example
t = tf.int32
params = tf.Variable(tf.ones_initializer((), dtype=dt))
data_batches = [[1], [2, 3], [4, 5, 6]]
# op that uploads data to TF and saves it as a persistent Tensor
data_saver_placeholder = tf.placeholder(dt)
tensor_handle_op = tf.get_session_handle(data_saver_placeholder)
data_placeholder, data = tf.get_session_tensor(dt)
train_op = tf.assign_add(params, tf.reduce_prod(data))
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
for batch in data_batches:
# upload tensor to TF runtime and save its handle
tensor_handle = sess.run(tensor_handle_op, feed_dict={data_saver_placeholder: batch})
# run train op several times reusing same data
for i in range(3):
sess.run(train_op, feed_dict={data_placeholder: tensor_handle.handle})
assert sess.run(params) == 382
If you do sess.run([myop,myop]) that'll only run myop once.
If you want to run the op, but not fetch its results to Python runtime you could use a control dependency. A simple way to do this is with a group op, ie
sess.run(tf.group(myop))
sess.run(tf.group(myop))
If your graph is large you may get an extra overhead by constructing group op between run calls (maybe 10-100ms for >10k node graph), so you could construct it ahead of time
myop_nooutput = tf.group(myop)
sess.run(myop_nooutput)
sess.run(myop_nooutput)