Tensorflow: working tf.while_loop does not work as part of Dataset API input pipeline - tensorflow

My problem is an image keypoint recognition task on images of snails. I have found that although there are many prewritten image augmentation functions for classification tasks (such as Keras' ImageDataGenerator), there are none that I can find suitable for this problem, which requires changes to the output keypoints to match the random transformations of the image. Hence I am writing my own to be mapped onto the dataset as it is read from TFRecord.
The logic I am using involves a while loop which continues to generate random transformations (rotation + shift + zoom etc.) and apply them the real keypoints until it finds a set of transformations where the keypoints fit into the image. This is to avoid transformations that leave part of the snail outside the image. It would then apply those same transformations to the image and return them.
My problem is that, while I have successfully got this augmentation function to work on a single test set of keypoints, when I use the same function as part of my input pipeline, it does not work, throwing the following error: 'Merge can not have more than one valid input' (full trace included at end). I have not been able to find an explanation anywhere.
# Defining cond argument to while loop.'ph' are placeholders to match numbers of arguments for tf.while_loop
def not_fit_in_image(landmarks, ph2, ph3, ph4, ph5, ph6):
# tf logical operators to find if landmarks fit in image
return landmarks_not_fit_in_image
def augmentation_function(image, original_landmarks):
def body(ph1, ph2, ph3, ph4, ph5, ph6):
shift = tf.random_uniform([1, 2], -shift_max, shift_max, tf.float32)
landmarks = original_landmarks + shift
# More random transformations generated and applied
return landmarks, rotation, shift, zoom, y_over_x_proportion_change, shear
# placeholders to match number of arguments
ph_a = tf.constant(0, dtype=tf.float32)
landmarks, rotation, shift, zoom, y_over_x_proportion_change, shear = tf.while_loop(not_fit_in_image, body, [original_landmarks, ph_a, ph_b, ph_a, ph_a, ph_a])
# In future, would now apply these same transformations to image.
return image, landmarks
# Setting up input data pipeline using Dataset API
train = tf.data.TFRecordDataset(train_data_tfrecords).map(parse_function)
train = train.map(augmentation_function) # Using the above augmentation function
train = train.repeat().shuffle(buffer_size).batch(batch_size)
# ... Set up handle, iterator, init ops ... all works ...
with tf.Session() as sess:
train_handle = sess.run(train_iterator.string_handle())
sess.run(train_init_op)
train_images, train_landmarks = sess.run(next_batch, feed_dict={handle: train_handle})
The following error occurs:
2017-11-10 13:08:14.449612: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Internal: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
Traceback (most recent call last):
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
return fn(*args)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
status, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,384,384], [?,15,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorFromStringHandle)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/hanne/Documents/Tensorflow Projects/Snails/random_rotations_working_while_loop_experiments.py", line 143, in <module>
train_images, train_landmarks = sess.run(next_batch, feed_dict={handle: train_handle})
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,384,384], [?,15,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorFromStringHandle)]]
This is my first time asking a question on stack overflow, so any comments about how to write better questions are also very welcome! I have tried to strip down the code above as much as I can for brevity and it is hence minimal but NOT complete or verifiable - let me know if I should include more code.
EDIT
I was able to figure out what was wrong! tf.while_loop acts like a python while loop, checking the condition before each run of 'body', which includes THE VERY FIRST RUN. The argument 'loop_vars' takes the variables for this first check. I had entered placeholder values of the wrong format to 'loop_vars', which caused the error above. A good way around this, which worked for me, is to enter the result of a first run of 'body' to the loop_vars variable, as this is assured of being of the right form.

Related

Graph disconnected: cannot obtain value for tensor Tensor() at layer "input_1"

The code for this problem is quite complex because I'm trying to implement fractalNet but changing the convolution base block to just a dense layer. I'm trying to separately build two fractalNets (one after the other so I don't think they should be interfering). One for the policy and one for the value function.
There are also a number of issues I have seen so far that may or may not be related. One is that I can't import numpy as np and use np which is why I've been forced to use numpy(). The other is that my code seems to trying to be working on tensors tf.Tensor[stuff] as well as Tensor[stuff] in different sections at the same time. The build_model function below outputs Tensor[stuff] from the Input call whereas the neural network builder code uses tf.Tensor[stuff]. I tried but to no avail to stick to type.
Here is the complete error that keeps killing the code:
/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py:190: UserWarning: Model inputs must come from `keras.layers.Input` (thus holding past layer metadata), they cannot be the output of a previous non-Input layer. Here, a tensor specified as input to your model was not an Input tensor, it was generated by layer activation_1.
Note that input tensors are instantiated via `tensor = keras.layers.Input(shape)`.
The tensor that caused the issue was: activation_1/Relu:0
str(x.name))
Traceback (most recent call last):
File "train.py", line 355, in <module>
main(**vars(args))
File "train.py", line 302, in main
val_func = NNValueFunction(bl,c,layersizes,dropout,deepest,obs_dim) # Initialize the value function
File "/home/ryan/trpo_fractalNN/trpo/value.py", line 37, in __init__
self.model = self._build_model()
File "/home/ryan/trpo_fractalNN/trpo/value.py", line 56, in _build_model
model = Model(inputs=obs_input, outputs=outputs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py", line 94, in __init__
self._init_graph_network(*args, **kwargs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py", line 241, in _init_graph_network
self.inputs, self.outputs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py", line 1511, in _map_graph_network
str(layers_with_complete_input))
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(None, 29), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
So here is the part of the code that I'm suspicious of at the moment because of the fact that somehow it is breaking at the very beginning on the value function's neural net.
def _build_model(self):
""" Construct TensorFlow graph, including loss function, init op and train op """
# hid1 layer size is 10x obs_dim, hid3 size is 10, and hid2 is geometric mean
# hid3_units = 5 # 5 chosen empirically on 'Hopper-v1'
# hid2_units = int(np.sqrt(hid1_units * hid3_units))
# heuristic to set learning rate based on NN size (tuned on 'Hopper-v1')
obs = keras.layers.Input(shape=(self.obs_dim,))
# I'm not sure why it won't work with np??????????????????????????????????????????????????????????????????????????????????
obs_input = Dense(int(self.layersizes[0][0].numpy()))(obs) # Initial fully-connected layer that brings obs number up to a len that will work with fractal architecture
obs_input = Activation('relu')(obs_input)
self.lr = 1e-2 / np.sqrt(self.layersizes[2][0]) # 1e-2 empirically determined
print('Value Params -- lr: {:.3g}'
.format(self.lr))
outputs = fractal_net(self,bl=self.bl,c=self.c,layersizes=self.layersizes,
drop_path=0.15,dropout=self.dropout,
deepest=self.deepest)(obs_input)
model = Model(inputs=obs_input, outputs=outputs)
optimizer = Adam(self.lr)
model.compile(optimizer=optimizer, loss='mse')
return model
I found out the issue. The problem was that since I was trying to combine multiple files, I had a 'Dense' call to bring the obs_len to the desired size and then took that and plugged it into the fractalNet code. However, I didn't realize that this would break things. I solved the issue by removing the initial Dense call and placing it inside the fractalNet code itself.
So moral of the story, don't try to break up different parts of the NN layers into separate files. Just as a side comment, In the current fractalNN code, it calls fractal_net and then a Dense layer afterwards and apparently this still works. But I think it breaks things to try to reverse this order. I hope this helps someone else.

Invalid Argument Error Tensorflow Object Detection Training

I am training tensor flow object detection following the tensor flow API. I have trained many models in the past using the exact same steps. This model however keeps giving me the error message below. The error message references
InvalidArgumentError: image_size must contain 3 elements[4]
I searched the error and found
InvalidArgumentError: image_size must contain 3 elements[4] #3349
which shows the error and gives the solution of checking to make sure that all images are RGB. I used the code provided in that thread to check all images. I found about 15 images that were not RGB. I removed the images and the corresponding xml files. I recompiled the csv files and the tfrecord files and restarted the training. I received the error message again. I then tried to start the training over without resuming from the last checkpoint and I still received the error. The error does not happen on a regular basis. Sometimes the model will go for several thousand steps before a failure. I have also tried removing the random crop parameter from the pipeline.config file which had no affect.
Any help is appreciated.
Error Message:
INFO:tensorflow:global_step/sec: 2.03361
INFO:tensorflow:global step 4039: loss = 6.2836 (0.512 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, image_size must contain 3 elements[4]
[[Node: cond_2/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2],max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/RandomCropImage/Shape, cond_2/RandomCropImage/ExpandDims, cond_2/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
INFO:tensorflow:Recording summary at step 4039.
INFO:tensorflow:global step 4040: loss = 4.6984 (0.880 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "/floyd/object_detection/legacy/train.py", line 184, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, inrun
_sys.exit(main(argv))
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "/floyd/object_detection/legacy/train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/floyd/object_detection/legacy/trainer.py", line 415, in train
saver=saver)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 833, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1244,in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409,in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: image_size must contain 3 elements[4]
[[Node: cond_2/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2],max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/RandomCropImage/Shape, cond_2/RandomCropImage/ExpandDims, cond_2/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
Thanks in advance.
so it was the RGB image problem. I had checked the images and removed the non RGB images and recreated the records, but the model was still pointing to the old records because the paths were very similar, I did not notice.

slim.dataset_data_provider.DatasetDataProvider with num_epochs=1 throws error

I am using the relatively new tf.slim Dataset, DatasetDataProvider pattern. The following code shows the key fragments:
with tf.Graph().as_default():
# get the dataset split
dataset = util.get_split(train_or_eval,
args.tfrecord_folder,
0,
args.eval_set_size,
crop_size,
file_pattern=file_pattern)
features, labels = util.load_batch(dataset,
batch_size=args.eval_batch_size,
num_readers=10,
num_epochs=1,
is_training=True)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
# start the queue runner
with slim.queues.QueueRunners(sess):
...run some ops...
Here's the definition of load_batch:
def load_batch(dataset, batch_size=64, is_training=False,
num_epochs=None, common_queue_capacity=256,
common_queue_min=32, num_readers=None):
shuffle = True
# create the data provider
data_provider = slim.dataset_data_provider.DatasetDataProvider(
dataset,
num_readers=num_readers,
shuffle=shuffle,
num_epochs=num_epochs,
common_queue_capacity=
common_queue_capacity,
common_queue_min= common_queue_min,
seed=5)
# get the tensors from the data provider
images, labels = data_provider.get(['image_raw','label'])
# batch up some training data
images, labels = tf.train.batch([image_raw, label],
batch_size=batch_size,
num_threads=5,
allow_smaller_final_batch=True,
capacity=2 * batch_size)
return images, labels
This works fine when num_epochs=None (which according to the comments in the source means that a file of tfrecords can be read an infinite number of times), but fails when num_epochs=1. Here's the error message:
Out of range: FIFOQueue '_9_batch/fifo_queue' is closed and has insufficient elements (requested 32, current size 0)
Obviously, I need to be able to run an eval step without repeating the examples to get good accuracy and confusion matrix numbers. Any thoughts would be appreciated...
Per the request in the comments I am adding the stack trace. I am running this job in Google Cloud ML so its easiest to show it this way. The logs have a series of paired messages as follows:
Out of range: FIFOQueue '_6_batch/fifo_queue' is closed and has
insufficient elements (requested 32, current size 0)[[Node: batch =
QueueDequeueUpToV2[component_types=[DT_UINT8, DT_INT64, DT_STRING,
DT_STRING], timeout_ms=-1,
_device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
[[Node: batch =
QueueDequeueUpToV2[component_types=[DT_UINT8, DT_INT64, DT_STRING,
DT_STRING], timeout_ms=-1,
_device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
Final Stack Trace is
"The replica master 0 exited with a non-zero status of 1. Termination
reason: Error.Traceback (most recent call last): [...] File
"/root/.local/lib/python2.7/site-packages/trainer/task.py", line 509,
in
main() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 505,
in main
run() File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 113,
in run
run_eval(args) File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 285,
in run_eval
is_training=True) File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 210,
in load_batch
capacity=3 * batch_size) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py",
line 872, in batch
name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py",
line 665, in _batch
dequeued = queue.dequeue_up_to(batch_size, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py",
line 499, in dequeue_up_to
self._queue_ref, n=n, component_types=self._dtypes, name=name) File
"/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py",
line 1402, in _queue_dequeue_up_to_v2
timeout_ms=timeout_ms, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py",
line 763, in apply_op
op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py",
line 2327, in create_op
original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py",
line 1226, in init
self._traceback = _extract_stack()
OutOfRangeError (see above for traceback): FIFOQueue
'_6_batch/fifo_queue' is closed and has insufficient elements
(requested 32, current size 0) [[Node: batch =
QueueDequeueUpToV2[component_types=[DT_UINT8, DT_INT64, DT_STRING,
DT_STRING], timeout_ms=-1,
_device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
To find out more about why your job exited please check the logs:
https://console.cloud.google.com/logs/viewer?...
After extensive study and reading on Github, many reported that eliminating this issue was a matter of making sure that the initializer for local and global variables is run at the top of the session. Like this one Using the following:
tf.group(tf.local_variables_initializer(), tf.global_variables_initializer{}
However, that did not fix the issue for many (including me), and I suspect for those that it did work, there were other problems leading to an empty FIFO queue.
After much reading, it appears that this is a defect for which there is not an obvious fix. Several work arounds are proposed. I was running a full cycle of train, eval, and predict. Here is the approach which worked for me:
1) On training, I set num_epochs=None. This cycles through the data an infinite number of times and if the documentation is correct, each example is presented only once per epoch. I did spot checking to confirm this, but my dataset was too large to guarantee the docs are correct. That said, my model did not overfit. Train, test, and validation were all reasonably close in terms of accuracy.
2) On eval, I was building a 15 model ensemble and I wanted to compare the proposal selection to ground truth before submitting unlabeled data for validation. I kept an extra hold out set from a k-fold cross validation run and needed to be sure that the each example in the hold out set was predicted once and only once. So to make that work, I: a)set num_epochs=1, b) eliminated all calculations from the eval graph except the prediction, c) reduced the size of the eval set to ~3000 examples, d) set shuffle_batch=False, e) set the batch size so that the queue would have a few extra examples
With these conditions, the queue runners did not run out of examples before my graph completed and I got my test set
3) On predict, I used the same technique again as for eval except that I chose a batch size and number of train steps that was exactly equal to the number of predict records. Since there was no gradient back prop, the predicts were fast enough to finish before the queue runner could kill my job.
Problem solved. Jury rigged. But, it worked. Desperation is the mother of ingenuity or something like that!

Receiving Negative Input Dimensions with TensorFlow MonitoredTrainingSession

I'm attempting to switch from a tf.Session() to a tf.train.MonitoredTrainingSession (all on one machine, no fancy distributed computing), but I'm getting an error that I don't fully understand.
W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [16,-1,4] has negative dimensions
E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [16,-1,4] has negative dimensions
[[Node: define_inputs/Placeholder = Placeholder[dtype=DT_FLOAT, shape=[16,?,4], _device="/job:local/replica:0/task:0/cpu:0"]()]]
Further down, I receive a little more information about the error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 89, in __exit__
next(self.gen)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
I'm using tf.contrib.seq2seq and my input and output sequences have variable lengths e.g. x_placeholder = tf.placeholder(tf.float32, [batch_size, None, 4]).
I suspect that the queues that I'm using to read data and bucket data by sequence length are somehow failing or getting interrupted by the MonitoredTrainingSession, as I don't have this problem with a vanilla Session.
Here's the code that sets up the MonitoredTrainingSession
# create a global step
global_step = tf.contrib.framework.get_or_create_global_step()
# define graph
model = import_model(global_step)
# create a one process cluster with an in-process server
server = tf.train.Server.create_local_server()
# define hooks for writing summaries and model variables to disk
hooks = construct_training_hooks(model.summary_op)
with tf.train.MonitoredTrainingSession(master=server.target,
is_chief=True,
hooks=hooks) as monitored_sess:
# create coordinator to handle threading
coord = tf.train.Coordinator()
# start threads to enqueue input minibatches for training
threads = tf.train.start_queue_runners(sess=monitored_sess, coord=coord)
# train
while not monitored_sess.should_stop():
train_op(monitored_sess, model, x_train, y_train, y_lengths_train)
# when done, ask the threads to stop
coord.request_stop()
# wait for threads to finish
coord.join(threads)
Here is how I'm creating my training hooks:
def construct_training_hooks(summary_op):
hooks = [tf.train.StopAtStepHook(last_step=tf.flags.FLAGS.training_steps),
tf.train.CheckpointSaverHook(checkpoint_dir=tf.flags.FLAGS.log_dir,
saver=tf.train.Saver(),
save_steps=10),
tf.train.SummarySaverHook(output_dir=tf.flags.FLAGS.log_dir,
summary_op=summary_op,
save_steps=10)]
return hooks

error while merging summaries for tensorboard

I am trying to generate the graph for MNIST beginner tutorial but is getting the following error. For some reason, merged_summary_op object is None.
Traceback (most recent call last):
File "mnist1.py", line 48, in <module>
summary_str = sess.run(merged_summary_op)
File "/home/vagrant/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 307, in run
% (subfetch, fetch, type(subfetch), e.message))
TypeError: Fetch argument None of None has invalid type <type 'NoneType'>, must be a string or Tensor. (Can not convert a NoneType into a Tensor or Operation.)
I think I am missing a step here. I launched the session first and then running the statement:
merged_summary_op = tf.merge_all_summaries()
I had the same error.
In my case, adding at least one tf.scalar_summary() before calling tf.merge_all_summaries() solved the problem.
For example,
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
tf.scalar_summary("cross_entropy", cross_entropy)
merged_summary_op = tf.merge_all_summaries()
I hope this snippet helps you.