error while merging summaries for tensorboard - tensorflow

I am trying to generate the graph for MNIST beginner tutorial but is getting the following error. For some reason, merged_summary_op object is None.
Traceback (most recent call last):
File "mnist1.py", line 48, in <module>
summary_str = sess.run(merged_summary_op)
File "/home/vagrant/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 307, in run
% (subfetch, fetch, type(subfetch), e.message))
TypeError: Fetch argument None of None has invalid type <type 'NoneType'>, must be a string or Tensor. (Can not convert a NoneType into a Tensor or Operation.)
I think I am missing a step here. I launched the session first and then running the statement:
merged_summary_op = tf.merge_all_summaries()

I had the same error.
In my case, adding at least one tf.scalar_summary() before calling tf.merge_all_summaries() solved the problem.
For example,
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
tf.scalar_summary("cross_entropy", cross_entropy)
merged_summary_op = tf.merge_all_summaries()
I hope this snippet helps you.

Related

Graph disconnected: cannot obtain value for tensor Tensor() at layer "input_1"

The code for this problem is quite complex because I'm trying to implement fractalNet but changing the convolution base block to just a dense layer. I'm trying to separately build two fractalNets (one after the other so I don't think they should be interfering). One for the policy and one for the value function.
There are also a number of issues I have seen so far that may or may not be related. One is that I can't import numpy as np and use np which is why I've been forced to use numpy(). The other is that my code seems to trying to be working on tensors tf.Tensor[stuff] as well as Tensor[stuff] in different sections at the same time. The build_model function below outputs Tensor[stuff] from the Input call whereas the neural network builder code uses tf.Tensor[stuff]. I tried but to no avail to stick to type.
Here is the complete error that keeps killing the code:
/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py:190: UserWarning: Model inputs must come from `keras.layers.Input` (thus holding past layer metadata), they cannot be the output of a previous non-Input layer. Here, a tensor specified as input to your model was not an Input tensor, it was generated by layer activation_1.
Note that input tensors are instantiated via `tensor = keras.layers.Input(shape)`.
The tensor that caused the issue was: activation_1/Relu:0
str(x.name))
Traceback (most recent call last):
File "train.py", line 355, in <module>
main(**vars(args))
File "train.py", line 302, in main
val_func = NNValueFunction(bl,c,layersizes,dropout,deepest,obs_dim) # Initialize the value function
File "/home/ryan/trpo_fractalNN/trpo/value.py", line 37, in __init__
self.model = self._build_model()
File "/home/ryan/trpo_fractalNN/trpo/value.py", line 56, in _build_model
model = Model(inputs=obs_input, outputs=outputs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py", line 94, in __init__
self._init_graph_network(*args, **kwargs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py", line 241, in _init_graph_network
self.inputs, self.outputs)
File "/home/ryan/.local/lib/python3.6/site-packages/keras/engine/network.py", line 1511, in _map_graph_network
str(layers_with_complete_input))
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(None, 29), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
So here is the part of the code that I'm suspicious of at the moment because of the fact that somehow it is breaking at the very beginning on the value function's neural net.
def _build_model(self):
""" Construct TensorFlow graph, including loss function, init op and train op """
# hid1 layer size is 10x obs_dim, hid3 size is 10, and hid2 is geometric mean
# hid3_units = 5 # 5 chosen empirically on 'Hopper-v1'
# hid2_units = int(np.sqrt(hid1_units * hid3_units))
# heuristic to set learning rate based on NN size (tuned on 'Hopper-v1')
obs = keras.layers.Input(shape=(self.obs_dim,))
# I'm not sure why it won't work with np??????????????????????????????????????????????????????????????????????????????????
obs_input = Dense(int(self.layersizes[0][0].numpy()))(obs) # Initial fully-connected layer that brings obs number up to a len that will work with fractal architecture
obs_input = Activation('relu')(obs_input)
self.lr = 1e-2 / np.sqrt(self.layersizes[2][0]) # 1e-2 empirically determined
print('Value Params -- lr: {:.3g}'
.format(self.lr))
outputs = fractal_net(self,bl=self.bl,c=self.c,layersizes=self.layersizes,
drop_path=0.15,dropout=self.dropout,
deepest=self.deepest)(obs_input)
model = Model(inputs=obs_input, outputs=outputs)
optimizer = Adam(self.lr)
model.compile(optimizer=optimizer, loss='mse')
return model
I found out the issue. The problem was that since I was trying to combine multiple files, I had a 'Dense' call to bring the obs_len to the desired size and then took that and plugged it into the fractalNet code. However, I didn't realize that this would break things. I solved the issue by removing the initial Dense call and placing it inside the fractalNet code itself.
So moral of the story, don't try to break up different parts of the NN layers into separate files. Just as a side comment, In the current fractalNN code, it calls fractal_net and then a Dense layer afterwards and apparently this still works. But I think it breaks things to try to reverse this order. I hope this helps someone else.

TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first

I am using a modified predict.py for testing a pruned SqueezeNet Model
[phung#archlinux SqueezeNet-Pruning]$ python predict.py --image 3_100.jpg --model model_prunned --num_class 2
prediction in progress
Traceback (most recent call last):
File “predict.py”, line 66, in
prediction = predict_image(imagepath)
File “predict.py”, line 52, in predict_image
index = output.data.numpy().argmax()
TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[phung#archlinux SqueezeNet-Pruning]$
I understand that numpy does not support GPU yet.
How shall I modify the code to get away from this error without invoking tensor copy data operation, tensor.cpu() ?
Change
index = output.data.numpy().argmax()
to
index = output.cpu().data.numpy().argmax()
This means data is first moved to cpu and then converted to numpy array.
I found out that I can just use
output.argmax()
You can use torch.max function like follows:
value, index = torch.max(output,1)
I am facing the same error for the following lines of code
with torch.no_grad():
for idx, (inputs, targets) in enumerate(ploader):
inputs, targets = inputs.to(device), targets.to(device)
outputs = net(inputs)
scores, predicted = outputs.max(1)
# save top1 confidence score
outputs = F.normalize(outputs, dim=1)
probs = F.softmax(outputs, dim=1)
top1_scores.append(probs[0][predicted.item()])
progress_bar(idx, len(ploader))
idx = np.argsort(top1_scores) # Error for this line

Trying to restore model, but tf.train.import_meta_graph(meta_path) raises error

I downloaded pretrained mobilenetV2 models from tensorflow models,and try to restore the graph,but got unexpected error.
Codes to reproduce the error is pretty concise:
import tensorflow as tf
meta_path = 'path/to/mobilenet_v2_0.35_224/mobilenet_v2_0.35_224.ckpt.meta'
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
saver = tf.train.import_meta_graph(meta_path)
then the last line raises error:
Traceback (most recent call last):
File "/home/CVAR/study/codes/languages/python/pycharm/learn_tensorflow/train_mobileNet_v2/test_of_functions/saver_test.py", line 21, in <module>
saver = tf.train.import_meta_graph(meta_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1960, in import_meta_graph
**kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/meta_graph.py", line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 391, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 158, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'InfeedEnqueueTuple'
My system information is :
ubuntu 16.04
python 3.5
tensorflow-gpu 1.9
Any idea?
I recently also met such a problem. It seems like the reason is that the TensorFlow version you use to train the model is different from the version you use to read the graph description proto. What you need to do is to reinstall the TensorFlow to your training version. Otherwise, retraining the model would work.
FYI, the TensorFlow version I used to train is 1.12.0, by contrast, the version I use to load the graph is 1.13.1. Reinstallation solves the problem.
There are some ops not defined. from conv_blocks import * will fix this bug but I got another problem "ValueError: NodeDef expected inputs 'float, int32' do not match 1 inputs specified;". Still debugging, but hope this tip solves your problem.

Tensorflow: working tf.while_loop does not work as part of Dataset API input pipeline

My problem is an image keypoint recognition task on images of snails. I have found that although there are many prewritten image augmentation functions for classification tasks (such as Keras' ImageDataGenerator), there are none that I can find suitable for this problem, which requires changes to the output keypoints to match the random transformations of the image. Hence I am writing my own to be mapped onto the dataset as it is read from TFRecord.
The logic I am using involves a while loop which continues to generate random transformations (rotation + shift + zoom etc.) and apply them the real keypoints until it finds a set of transformations where the keypoints fit into the image. This is to avoid transformations that leave part of the snail outside the image. It would then apply those same transformations to the image and return them.
My problem is that, while I have successfully got this augmentation function to work on a single test set of keypoints, when I use the same function as part of my input pipeline, it does not work, throwing the following error: 'Merge can not have more than one valid input' (full trace included at end). I have not been able to find an explanation anywhere.
# Defining cond argument to while loop.'ph' are placeholders to match numbers of arguments for tf.while_loop
def not_fit_in_image(landmarks, ph2, ph3, ph4, ph5, ph6):
# tf logical operators to find if landmarks fit in image
return landmarks_not_fit_in_image
def augmentation_function(image, original_landmarks):
def body(ph1, ph2, ph3, ph4, ph5, ph6):
shift = tf.random_uniform([1, 2], -shift_max, shift_max, tf.float32)
landmarks = original_landmarks + shift
# More random transformations generated and applied
return landmarks, rotation, shift, zoom, y_over_x_proportion_change, shear
# placeholders to match number of arguments
ph_a = tf.constant(0, dtype=tf.float32)
landmarks, rotation, shift, zoom, y_over_x_proportion_change, shear = tf.while_loop(not_fit_in_image, body, [original_landmarks, ph_a, ph_b, ph_a, ph_a, ph_a])
# In future, would now apply these same transformations to image.
return image, landmarks
# Setting up input data pipeline using Dataset API
train = tf.data.TFRecordDataset(train_data_tfrecords).map(parse_function)
train = train.map(augmentation_function) # Using the above augmentation function
train = train.repeat().shuffle(buffer_size).batch(batch_size)
# ... Set up handle, iterator, init ops ... all works ...
with tf.Session() as sess:
train_handle = sess.run(train_iterator.string_handle())
sess.run(train_init_op)
train_images, train_landmarks = sess.run(next_batch, feed_dict={handle: train_handle})
The following error occurs:
2017-11-10 13:08:14.449612: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Internal: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
Traceback (most recent call last):
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
return fn(*args)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
status, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,384,384], [?,15,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorFromStringHandle)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/hanne/Documents/Tensorflow Projects/Snails/random_rotations_working_while_loop_experiments.py", line 143, in <module>
train_images, train_landmarks = sess.run(next_batch, feed_dict={handle: train_handle})
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,384,384], [?,15,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorFromStringHandle)]]
This is my first time asking a question on stack overflow, so any comments about how to write better questions are also very welcome! I have tried to strip down the code above as much as I can for brevity and it is hence minimal but NOT complete or verifiable - let me know if I should include more code.
EDIT
I was able to figure out what was wrong! tf.while_loop acts like a python while loop, checking the condition before each run of 'body', which includes THE VERY FIRST RUN. The argument 'loop_vars' takes the variables for this first check. I had entered placeholder values of the wrong format to 'loop_vars', which caused the error above. A good way around this, which worked for me, is to enter the result of a first run of 'body' to the loop_vars variable, as this is assured of being of the right form.

tensorflow MNIST fully_connected_feed.py fails: range() takes at least 2 arguments (1 given)

I'm having trouble running the example in one of the tensor flow tutorials. The tutorial says to run I just need to type python fully_connected_feed.py. When I do this it gets through fetching the input data, but then fails, like so:
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Traceback (most recent call last):
File "fully_connected_feed.py", line 225, in <module>
tf.app.run()
File "/Users/me/anaconda/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 11, in run
sys.exit(main(sys.argv))
File "fully_connected_feed.py", line 221, in main
run_training()
File "fully_connected_feed.py", line 141, in run_training
loss = mnist.loss(logits, labels_placeholder)
File "/Users/me/tftmp/mnist.py", line 96, in loss
indices = tf.expand_dims(tf.range(batch_size), 1)
TypeError: range() takes at least 2 arguments (1 given)
I think this error is caused because there is some problem with session setup and/or tensor evaluation. This is the function in mnist.py causing the problem:
def loss(logits, labels):
"""Calculates the loss from the logits and the labels.
Args:
logits: Logits tensor, float - [batch_size, NUM_CLASSES].
labels: Labels tensor, int32 - [batch_size].
Returns:
loss: Loss tensor of type float.
"""
# Convert from sparse integer labels in the range [0, NUM_CLASSSES)
# to 1-hot dense float vectors (that is we will have batch_size vectors,
# each with NUM_CLASSES values, all of which are 0.0 except there will
# be a 1.0 in the entry corresponding to the label).
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(batch_size), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(
concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, onehot_labels,
name='xentropy')
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
return loss
If I put all the code in the loss function inside a with tf.Session(): block, it gets past this error. However, I get other errors later about uninitialised variables, so I'm guessing something major is going wrong with session setup or initialisation, or something. Being new to tensor flow I'm a little at a loss. Any ideas?
[NB: I havent edited the code at all, just downloaded from the tensorflow tutorials and tried to run as instructed, with python fully_connected_feed.py]
This issue arises because in the latest version of the TensorFlow source on GitHub, tf.range() has been updated to be more permissive with its arguments (previously it required two arguments; now it has the same semantics as Python's range() built-in function), and the fully_connected_feed.py example has been updated to exploit this.
However, if you try to run this version against the binary distribution of TensorFlow, you will get this error because the change to tf.range() has not been incorporated into the binary package.
The easiest solution is to download the old version of mnist.py. Alternatively, you could build from source to use the latest version of the tutorial.
you can right result fix mnist code like this :
indices = tf.expand_dims(tf.range(0,batch_size),1)
TypeError: range() takes at least 2 arguments (1 given)
That's the error.
Looking at the tensorflow docs for range, we can see that range has a function signature of start, limit, delta=1, name='range'. This means that at least two arguments are required for function invocation. Your example only shows one argument provided.
An example can be found in the docs:
# 'start' is 3
# 'limit' is 18
# 'delta' is 3
tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]