Restore checkpoint in Tensorflow - tensor name not found

Restore checkpoint in Tensorflow - tensor name not found - tensorflow

Trying to run the Inceptionv3 Tensorflow model with the architecture and the checkpoint provided by Google here.
My issue is that my script crashes on saver.restore(sess, "./inception_v3.ckpt") with the following error:
tensorflow.python.framework.errors.NotFoundError: Tensor name "InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/biases" not found in checkpoint files ./inception_v3.ckpt
Here is my code:
import tensorflow as tf
import inception_v3
with tf.Session() as sess:
image = tf.read_file('./file.jpg')
# code to decode, crop, convert jpeg
eval_inputs = tf.pack([image])
logits, _ = inception_v3.inception_v3(eval_inputs, num_classes=1001, is_training=False)
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver()
saver.restore(sess, "./inception_v3.ckpt")
I get the same errors with the other checkpoint/model combinations so this must be an issue with my code. Not sure what I am doing wrong though.
Thank you

Indeed the checkpoint file does not contain this tensor. Can you file a bug on github?

You need to call inception_v3() within the arg_scope() returned by inception_v3_arg_scope() like this:
import tensorflow as tf
import tensorflow.contrib.slim as slim
from nets.inception_v3 import inception_v3, inception_v3_arg_scope
height = 299
width = 299
channels = 3
# Create graph
X = tf.placeholder(tf.float32, shape=[None, height, width, channels])
with slim.arg_scope(inception_v3_arg_scope()):
logits, end_points = inception_v3(X, num_classes=1001,
is_training=False)
predictions = end_points["Predictions"]
saver = tf.train.Saver()
X_test = ... # your images, shape [batch_size, 299, 299, 3]
# Execute graph
with tf.Session() as sess:
saver.restore(sess, "./inception_v3.ckpt")
predictions_val = predictions.eval(feed_dict={X: X_test})
predicted_classes = np.argmax(predictions_val, axis=1)
I recommend clearly separating the construction phase and the execution phase. Just tested on a random photo on the web, and it worked fine. :)

Related

how to restore variables in fully_connected function

In my training file(train.py), I write:
def deep_part(self):
with tf.variable_scope("deep-part"):
y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.factor_size]) # None * (F*K)
# self.deep_layers = 2
for i in range(0,len(self.deep_layers)):
y_deep = tf.contrib.layers.fully_connected(y_deep, self.deep_layers[i], \
activation_fn=self.deep_layers_activation, scope = 'fc%d' % i)
return y_deep
now in predict file(predict.py), I restore the checkpoint, but I dont know how to reload the "deep-part" network's weights and biases.Because I think the "fully_conncted" function might hide the weights and biases.

I wrote a lengthy explanation here. A short summary:
By saver.save(sess, '/tmp/my_model') Tensorflow produces multiple files:
checkpoint
my_model.data-00000-of-00001
my_model.index
my_model.meta
The checkpoint file checkpoint is just a pointer to the latest version of our model-weights and it is simply a plain text file containing
$ !cat /tmp/model/checkpoint
model_checkpoint_path: "/tmp/my_model"
all_model_checkpoint_paths: "/tmp/my_model"
The others are binary files containing the graph (.meta) and weights (.data*).
You can help yourself by running
import tensorflow as tf
import numpy as np
data = np.arange(9 * 1).reshape(1, 9).astype(np.float32)
plhdr = tf.placeholder(tf.float32, shape=[1, 9], name='input')
print plhdr.name
activation = tf.layers.dense(plhdr, 10, name='fc')
print activation.name
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
expected = sess.run(activation, {plhdr: data})
print expected
saver = tf.train.Saver(tf.global_variables())
saver.save(sess, '/tmp/my_model')
tf.reset_default_graph()
with tf.Session() as sess:
# load the computation graph (the fully connected + placeholder)
loader = tf.train.import_meta_graph('/tmp/my_model.meta')
sess.run(tf.global_variables_initializer())
plhdr = tf.get_default_graph().get_tensor_by_name('input:0')
activation = tf.get_default_graph().get_tensor_by_name('fc/BiasAdd:0')
actual = sess.run(activation, {plhdr: data})
assert np.allclose(actual, expected) is False
# now load the weights
loader = loader.restore(sess, '/tmp/my_model')
actual = sess.run(activation, {plhdr: data})
assert np.allclose(actual, expected) is True

multiple runs of inference model using saved checkpoint produces stochastic errors - Tensorflow

I am running Tensorflow 0.12.1 on a GPU. I have a trained Deep CNN model whose weights I've saved using a checkpoint file. During inference, I reload the saved checkpoint using restorer.restore(sess, tf.train.latest_checkpoint(FLAGS.train_dir)). The code seems to run without issues, but everytime I re-run the script, I'm getting screwed up outputs. AFAIK, I do not shuffle my test set inputs. The inputs are being loaded and fed to the network properly. It is just the output of different runs of the CNN on the same test set using the same order is producing very different outputs. I'm perplexed! Also, how do I execute a graph loaded with saved checkpoint without running an init_op during inference? It seems my code requires all global and local variables to be initialized before execution. (I initialize first, and then only restore the checkpoint!).Here's a snippet of my code:
import tensorflow as tf
import numpy as np
import os
import os.path
from datetime import datetime
import time
import random
import json
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
from modelFCNN3 import model
def read_input(inp_queue,height=224,width=224,channels=3, mask=False):
value = tf.read_file(inp_queue)
image = tf.image.decode_png(value)
image = tf.image.resize_images(image, [height, width],method=2)
image = tf.cast(image, tf.uint8)
image.set_shape([height,width,channels])
image = tf.reshape(image,[height,width,channels])
if mask:
image = tf.to_float(tf.greater_equal(image,128))
image = tf.cast(image,tf.float32)
else:
image = tf.image.per_image_standardization(image)
image = tf.cast(image,tf.float32)
return image
if __name__ == '__main__':
tf.reset_default_graph()
with open('X_test.json', 'r') as infile:
X_test = json.load(infile)
with open('y_test.json', 'r') as infile:
y_test = json.load(infile)
imagelist = ops.convert_to_tensor(X_test, dtype=dtypes.string)
labellist = ops.convert_to_tensor(y_test, dtype=dtypes.string)
input_queue = tf.train.slice_input_producer([imagelist, labellist],
num_epochs=1,
shuffle=False)
image = read_input(input_queue[0],height=224,width=224,channels=3, mask=False)
label = read_input(input_queue[1],height=224,width=224,channels=1, mask=True)
images_batch, labels_batch = tf.train.batch([image, label], batch_size=FLAGS.batch_size,
enqueue_many=False,shapes=None, allow_smaller_final_batch=True)
global_step = tf.Variable(0, trainable=False)
images = tf.placeholder_with_default(images_batch, shape=[None, 224,224,3])
labels = tf.placeholder_with_default(labels_batch, shape=[None, 224,224,1])
restorer = tf.train.Saver()
logits = model(images).logits
labels = tf.cast(labels,tf.int32)
labels.set_shape([FLAGS.batch_size,224,224,1])
valid_prediction = tf.argmax(tf.nn.softmax(logits), dimension=3)
valid_prediction.set_shape([FLAGS.batch_size,224,224])
meanIOU,update_op_mIOU= tf.contrib.metrics.streaming_mean_iou(tf.cast(valid_prediction,tf.int32), tf.squeeze(labels),FLAGS.num_classes)
init = tf.global_variables_initializer()
init_locals = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_locals])
restorer.restore(sess, tf.train.latest_checkpoint(FLAGS.train_dir))
print("Model restored.")
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord,sess=sess)
summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)
try:
step = 0
avg = []
while not coord.should_stop():
myimg, predimg, mylbl= sess.run([images,valid_prediction,labels])
mIOU,_ = sess.run([meanIOU,update_op_mIOU])
avg.append(mIOU)
step += 1
except tf.errors.OutOfRangeError:
print('Done training -- epoch limit reached')
finally:
coord.request_stop()
coord.join(threads)
sess.close()

Are you running on same machine or different machine
#saver = tf.train.Saver()
The following comment is in tensorflow docs
#NOTE: Restarting training from saved meta_graph only works if the device assignments have not changed.
#saver = tf.train.import_meta_graph(metafile)

Tensorboard embeddings does not display tensors?

Tensorboard provides embedding visualization of tensorflow variables by using tf.train.Saver(). The following is a working example (from this answer)
import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.tensorboard.plugins import projector
LOG_DIR = '/tmp/emb_logs/'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')
mnist = input_data.read_data_sets('MNIST_data')
#Variables
images = tf.Variable(mnist.test.images, name='images')
weights=tf.Variable(tf.random_normal([3,3,1,16]))
biases=tf.Variable(tf.zeros([16]))
#Tensor from the variables
x = tf.reshape(images, shape=[-1, 28, 28, 1])
conv_layer=tf.nn.conv2d(x, weights, [1,1,1,1], padding="SAME")
conv_layer=tf.add(conv_layer, biases)
y = tf.reshape(conv_layer, shape=[-1, 28*28*16])
with open(metadata, 'wb') as metadata_file:
for row in mnist.test.labels:
metadata_file.write('%d
' % row)
with tf.Session() as sess:
saver = tf.train.Saver([images])
sess.run(images.initializer)
saver.save(sess, os.path.join(LOG_DIR, 'images.ckpt'))
config = projector.ProjectorConfig()
# One can add multiple embeddings.
embedding = config.embeddings.add()
embedding.tensor_name = images.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = metadata
# Saves a config file that TensorBoard will read during startup.
projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)
How can I visualize embeddings from a tensorflow tensor, like y in the code above?
Simply replacing
saver = tf.train.Saver([images])
with
saver = tf.train.Saver([y])
doesn't work, because of the following error:
474 var = ops.convert_to_tensor(var, as_ref=True)
475 if not BaseSaverBuilder._IsVariable(var):
476 raise TypeError("Variable to save is not a Variable: %s" % var)
477 name = var.op.name
478 if name in names_to_saveables:
TypeError: Variable to save is not a Variable: Tensor("Reshape_11:0", shape=(10000, 12544), dtype=float32)
Does anyone know of an alternative to generate tensorboard embedding visualizations of a tf.tensor?

You could create a new variable you can assign the value of y to.
y_var = tf.Variable(tf.shape(y))
saver = tf.train.Saver([y_var])
assign_op = y_var.assign(y)
...
# Training
sess.run((loss, assign_op))
saver.save(...)

tf.summary.tensor_summary will save a summary for an arbitrary Tensor.

TFSlim - problems loading saved checkpoint for VGG16

(1) I'm trying to fine-tune a VGG-16 network using TFSlim by loading pretrained weights into all layers except thefc8 layer. I achieved this by using the TF-SLIm function as follows:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
vgg = nets.vgg
# Specify where the Model, trained on ImageNet, was saved.
model_path = 'path/to/vgg_16.ckpt'
# Specify where the new model will live:
log_dir = 'path/to/log/'
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = vgg.vgg_16(images)
variables_to_restore = slim.get_variables_to_restore(exclude=['fc8'])
restorer = tf.train.Saver(variables_to_restore)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
restorer.restore(sess,model_path)
print "model restored"
This works fine as long as I do not change the num_classes for the VGG16 model. What I would like to do is to change the num_classes from 1000 to 200. I was under the impression that if I did this modification by defining a new vgg16-modified class that replaces the fc8 to produce 200 outputs, (along with a variables_to_restore = slim.get_variables_to_restore(exclude=['fc8']) that everything will be fine and dandy. However, tensorflow complains of a dimensions mismatch:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,4096,200] rhs shape= [1,1,4096,1000]
So, how does one really go about doing this ? The documentation for TFSlim is really patchy and there are several versions scattered on Github - so not getting much help there.

You can try using slim's way of restoring — slim.assign_from_checkpoint.
There is related documentation in the slim sources:
https://github.com/tensorflow/tensorflow/blob/129665119ea60640f7ed921f36db9b5c23455224/tensorflow/contrib/slim/python/slim/learning.py
Corresponding part:
*************************************************
* Fine-Tuning Part of a model from a checkpoint *
*************************************************
Rather than initializing all of the weights of a given model, we sometimes
only want to restore some of the weights from a checkpoint. To do this, one
need only filter those variables to initialize as follows:
...
# Create the train_op
train_op = slim.learning.create_train_op(total_loss, optimizer)
checkpoint_path = '/path/to/old_model_checkpoint'
# Specify the variables to restore via a list of inclusion or exclusion
# patterns:
variables_to_restore = slim.get_variables_to_restore(
include=["conv"], exclude=["fc8", "fc9])
# or
variables_to_restore = slim.get_variables_to_restore(exclude=["conv"])
init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
checkpoint_path, variables_to_restore)
# Create an initial assignment function.
def InitAssignFn(sess):
sess.run(init_assign_op, init_feed_dict)
# Run training.
slim.learning.train(train_op, my_log_dir, init_fn=InitAssignFn)
Update
I tried the following:
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = nets.vgg.vgg_16(images)
print [v.name for v in slim.get_variables_to_restore(exclude=['fc8']) ]
And got this output (shortened):
[u'vgg_16/conv1/conv1_1/weights:0',
u'vgg_16/conv1/conv1_1/biases:0',
…
u'vgg_16/fc6/weights:0',
u'vgg_16/fc6/biases:0',
u'vgg_16/fc7/weights:0',
u'vgg_16/fc7/biases:0',
u'vgg_16/fc8/weights:0',
u'vgg_16/fc8/biases:0']
So it looks like you should prefix scope with vgg_16:
print [v.name for v in slim.get_variables_to_restore(exclude=['vgg_16/fc8']) ]
gives (shortened):
[u'vgg_16/conv1/conv1_1/weights:0',
u'vgg_16/conv1/conv1_1/biases:0',
…
u'vgg_16/fc6/weights:0',
u'vgg_16/fc6/biases:0',
u'vgg_16/fc7/weights:0',
u'vgg_16/fc7/biases:0']
Update 2
Complete example that executes without errors (at my system).
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
s = tf.Session(config=tf.ConfigProto(gpu_options={'allow_growth':True}))
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
predictions = nets.vgg.vgg_16(images, 200)
variables_to_restore = slim.get_variables_to_restore(exclude=['vgg_16/fc8'])
init_assign_op, init_feed_dict = slim.assign_from_checkpoint('./vgg16.ckpt', variables_to_restore)
s.run(init_assign_op, init_feed_dict)
In the example above vgg16.ckpt is a checkpoint saved by tf.train.Saver for 1000 classes VGG16 model.
Using this checkpoint with all variables of 200 classes model (including fc8) gives the following error:
init_assign_op, init_feed_dict = slim.assign_from_checkpoint('./vgg16.ckpt', slim.get_variables_to_restore())
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
1 init_assign_op, init_feed_dict = slim.assign_from_checkpoint(
----> 2 './vgg16.ckpt', slim.get_variables_to_restore())
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/variables.pyc in assign_from_checkpoint(model_path, var_list)
527 assign_ops.append(var.assign(placeholder_value))
528
--> 529 feed_dict[placeholder_value] = var_value.reshape(var.get_shape())
530
531 assign_op = control_flow_ops.group(*assign_ops)
ValueError: total size of new array must be unchanged

Tensorflow inception_v2_resnet inference

With reference to this post:
Using pre-trained inception_resnet_v2 with Tensorflow
i am trying to use the inception_resnet_v2 model to get predictions of images also. Therefore i looked at the snippet and tried to get it running, but it says "input_tensor" is not defined. Is there anything missing in the code mentioned or can anyone get me some hint to get it running / how to define the input_tensor variable?
Here is the snippet again:
import tensorflow as tf
slim = tf.contrib.slim
from PIL import Image
from inception_resnet_v2 import *
import numpy as np
checkpoint_file = 'inception_resnet_v2_2016_08_30.ckpt'
sample_images = ['dog.jpg', 'panda.jpg']
#Load the model
sess = tf.Session()
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(input_tensor, is_training=False)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_file)
for image in sample_images:
im = Image.open(image).resize((299,299))
im = np.array(im)
im = im.reshape(-1,299,299,3)
predict_values, logit_values = sess.run([end_points['Predictions'],logits], feed_dict={input_tensor: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
Thanks

The code snippet appears to lack any definition for input_tensor. Looking at the definition of the inception_resnet_v2() function, the fact that the tensor is used in a feed_dict, and the fact that the size of your image is 299 x 299, you could define input_tensor as follows:
input_tensor = tf.placeholder(tf.float32, [None, 299, 299, 3])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Restore checkpoint in Tensorflow - tensor name not found - tensorflow

Indeed the checkpoint file does not contain this tensor. Can you file a bug on github?

Related

how to restore variables in fully_connected function

multiple runs of inference model using saved checkpoint produces stochastic errors - Tensorflow

Tensorboard embeddings does not display tensors?

TFSlim - problems loading saved checkpoint for VGG16

Tensorflow inception_v2_resnet inference

Categories

Resources