My question is about context and the TensorFlow default sessions and graph.
The problem:
Tensorflow is unable to feed a placeholder in the following scenario:
Function Test defines a graph.
Function Test_Once defines a session.
When Function Test calls Test_Once -> Feeding fails.
When I change the code so function Test declares the graph + the session -> all is working.
Here is the code:
def test_once(g, saver, summary_writer, logits, images, summary_op):
"""Run a session once for a givven test image.
Args:
saver: Saver.
summary_writer: Summary writer.
logits:
summary_op: Summary op.
"""
with tf.Session(graph=g) as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
images.astype(np.float32)
predictions = sess.run(logits, feed_dict={'InputPlaceHolder/TestInput:0':images})
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op))
summary_writer.add_summary(summary, global_step)
return (predictions)
def test():
"""Test LCPR with a test image"""
with tf.Graph().as_default() as g:
# Get image for testing
images, labels = lcpr.test_input()
# Build a Graph that computes the logits predictions from the
# inference model.
with tf.name_scope('InputPlaceHolder'):
test_image_placeholder = tf.placeholder(tf.float32, (None,None,None,3), 'TestInput')
# Display the training images in the visualizer.
# The 'max_outputs' default is 3. Not stated. (Max number of batch elements to generate images for.)
#tf.summary.image('input_images', test_image_placeholder)
with tf.name_scope('Inference'):
logits = lcpr.inference(test_image_placeholder)
# Restore the moving average version of the learned variables for eval.
variable_averages = tf.train.ExponentialMovingAverage(
lcpr.MOVING_AVERAGE_DECAY)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
# Build the summary operation based on the TF collection of Summaries.
writer = tf.summary.FileWriter("/tmp/lcpr/test")
writer.add_graph(g)
summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(FLAGS.test_dir, g)
#Sadly, this will not work:
predictions = test_once(g, saver, summary_writer, logits, images, summary_op)
'''Alternative working option :
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# Assuming model_checkpoint_path looks something like:
# /my-favorite-path/cifar10_train/model.ckpt-0,
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
x = sess.run(logits, feed_dict={'InputPlaceHolder/TestInput:0':images})
print(x)
'''
The above code yeilds an error that the placeholder is not fed:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'InputPlaceHolder/TestInput' with dtype float
And it's not that TensorFlow does not recognize the placeholder. If I change the name from 'InputPlaceHolder/TestInput:0' to 'InputPlaceHolder/TestInput:1' I receive a message calming that 'InputPlaceHolder/TestInput' exists but has only 1 output. This makes sense, and I guess the session runs on my default graph.
Things only work for me if I stay within the same def:
If I change the code by running the commented part (starting ' with tf.Session() as sess:) directly from within the first function all works.
I wonder what am I missing?
My guess that is context related, maybe not assigning the session to the graph?
Solved. Stupid mistake
test_once calls sess.run twice. On the second time, indeed no placeholder is fed.... : summary.ParseFromString(sess.run(summary_op))
Related
I am observing a strange behavior where Saver can't restore if the checkpoint was saved earlier in the same Python process. It loads fine if done from a different process. Here's some simple code that will show the problem.
import tensorflow.compat.v1 as tf
def train():
W = tf.Variable(tf.zeros([1, 1]))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, "./model.ckpt")
def predict():
W = tf.Variable(tf.zeros([1, 1]))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, "./model.ckpt")
train()
predict()
Here we save and restore immediately after that in the same process. Restoration fails with errors like:
Key Variable_1 not found in checkpoint
But if I run just the predict() code again from a new Python process it works just fine.
#train()
predict()
Am I doing something wrong here?
After predict, if you run:
print([v for v in tf.trainable_variables()])
you will see that two different variables are being created. That's why TF is not able to restore the value of the second one.
In order to link both variables into a single one, you can either:
Pass a dictionary to the argument var_list of tf.train.Saver. For example:
saver = tf.train.Saver({'W': W})
Use auto-reusing when creating the variable. For example:
with tf.variable_scope('', reuse=tf.AUTO_REUSE):
W = tf.get_variable(initializer=lambda: tf.zeros([1, 1]),
name='W')
I am trying to convert a Linear Classifier based on this example that works for exporting a DNN Classifier:
print("\n====== classifier model_dir, latest_checkpoint ===========")
print(classifier.model_dir)
print(classifier.latest_checkpoint())
debug = False
with tf.Session() as sess:
# First let's load meta graph and restore weights
latest_checkpoint_path = classifier.latest_checkpoint()
saver = tf.train.import_meta_graph(latest_checkpoint_path + '.meta')
saver.restore(sess, latest_checkpoint_path)
# Get the input and output tensors needed for toco.
# These were determined based on the debugging info printed / saved below.
input_tensor = sess.graph.get_tensor_by_name("dnn/input_from_feature_columns/input_layer/concat:0")
input_tensor.set_shape([1, 10])
out_tensor = sess.graph.get_tensor_by_name("dnn/logits/BiasAdd:0")
out_tensor.set_shape([1, 5])
# Pass the output node name we are interested in.
# Based on the debugging info printed / saved below, pulled out the
# name of the node for the logits (before the softmax is applied).
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess, sess.graph_def, output_node_names=["dnn/logits/BiasAdd"])
if debug is True:
print("\nORIGINAL GRAPH DEF Ops ===========================================")
ops = sess.graph.get_operations()
for op in ops:
if "BiasAdd" in op.name or "input_layer" in op.name:
print([op.name, op.values()])
# save original graphdef to text file
with open("estimator_graph.pbtxt", "w") as fp:
fp.write(str(sess.graph_def))
print("\nFROZEN GRAPH DEF Nodes ===========================================")
for node in frozen_graph_def.node:
print(node.name)
# save frozen graph def to text file
with open("estimator_frozen_graph.pbtxt", "w") as fp:
fp.write(str(frozen_graph_def))
tflite_model = tf.contrib.lite.toco_convert(frozen_graph_def, [input_tensor], [out_tensor])
open("estimator_model.tflite", "wb").write(tflite_model)
but I don't know which tensor to use in this section:
input_tensor = sess.graph.get_tensor_by_name("dnn/input_from_feature_columns/input_layer/concat:0")
input_tensor.set_shape([1, 10])
out_tensor = sess.graph.get_tensor_by_name("dnn/logits/BiasAdd:0")
out_tensor.set_shape([1, 3])
I have tried as input tensor:
linear/linear_model/linear_model/weighted_sum:0
shape: 1,5
(because I couldn't find a tensor that works with 1,10)
and as output tensor with: linear/head/predictions/probabilities:0
shape 1,5
but when I tried to use it in android device the shape of the output tensor is no longer 1,5 but 1,10
And I don't know how to interpret this result, maybe the problem is that I Don't know which tensor to choose as input to the toco_convert function
When freezing a graph and then running it elsewhere (mobile device), the output is of low quality compared to the inference on the server on my semantic segmentation model. It is basically a messy version of what would run on the server. It is executing successfully, but it appears as though something was not initialized prior to freezing, even though the method to load the model between the export script and inference scripts is nearly identical.
The exported model can be run on the same images over and over and produce the same results for a given set of images, as expected.
However, each time the model is frozen, using the exact same script and checkpoint, it creates a different output for a given set of images.
def main():
args = get_arguments()
if args.dataset == 'cityscapes':
num_classes = cityscapes_class
else:
num_classes = ADE20k_class
shape = [320, 320]
x = tf.placeholder(dtype=tf.float32, shape=(shape[0], shape[1], 3), name="input")
img_tf = preprocess(x)
model = model_config[args.model]
net = model({'data': img_tf}, num_classes=num_classes, filter_scale=args.filter_scale)
raw_output = net.layers['conv6_cls']
raw_output_up = tf.image.resize_bilinear(raw_output, size=shape, align_corners=True)
raw_output_maxed = tf.argmax(raw_output_up, axis=3, name="output")
# Init tf Session
config = tf.ConfigProto()
sess = tf.Session(config=config)
init = tf.global_variables_initializer()
sess.run(init)
model_path = model_paths[args.model]
ckpt = tf.train.get_checkpoint_state(model_path)
if ckpt and ckpt.model_checkpoint_path:
input_checkpoint = ckpt.model_checkpoint_path
loader = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=True)
load(loader, sess, ckpt.model_checkpoint_path)
else:
print('No checkpoint file found at %s.' % model_path)
exit()
print("Loaded Model")
# We retrieve the protobuf graph definition
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()
# We use a built-in TF helper to export variables to constants
output_graph_def = graph_util.convert_variables_to_constants(
sess, # The session is used to retrieve the weights
input_graph_def, # The graph_def is used to retrieve the nodes
output_node_names.split(",") # The output node names are used to select the usefull nodes
)
# Finally we serialize and dump the output graph to the filesystem
with tf.gfile.GFile("model/output_graph.pb", "wb") as f:
f.write(output_graph_def.SerializeToString())
print("%d ops in the final graph." % len(output_graph_def.node))
In my training file(train.py), I write:
def deep_part(self):
with tf.variable_scope("deep-part"):
y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.factor_size]) # None * (F*K)
# self.deep_layers = 2
for i in range(0,len(self.deep_layers)):
y_deep = tf.contrib.layers.fully_connected(y_deep, self.deep_layers[i], \
activation_fn=self.deep_layers_activation, scope = 'fc%d' % i)
return y_deep
now in predict file(predict.py), I restore the checkpoint, but I dont know how to reload the "deep-part" network's weights and biases.Because I think the "fully_conncted" function might hide the weights and biases.
I wrote a lengthy explanation here. A short summary:
By saver.save(sess, '/tmp/my_model') Tensorflow produces multiple files:
checkpoint
my_model.data-00000-of-00001
my_model.index
my_model.meta
The checkpoint file checkpoint is just a pointer to the latest version of our model-weights and it is simply a plain text file containing
$ !cat /tmp/model/checkpoint
model_checkpoint_path: "/tmp/my_model"
all_model_checkpoint_paths: "/tmp/my_model"
The others are binary files containing the graph (.meta) and weights (.data*).
You can help yourself by running
import tensorflow as tf
import numpy as np
data = np.arange(9 * 1).reshape(1, 9).astype(np.float32)
plhdr = tf.placeholder(tf.float32, shape=[1, 9], name='input')
print plhdr.name
activation = tf.layers.dense(plhdr, 10, name='fc')
print activation.name
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
expected = sess.run(activation, {plhdr: data})
print expected
saver = tf.train.Saver(tf.global_variables())
saver.save(sess, '/tmp/my_model')
tf.reset_default_graph()
with tf.Session() as sess:
# load the computation graph (the fully connected + placeholder)
loader = tf.train.import_meta_graph('/tmp/my_model.meta')
sess.run(tf.global_variables_initializer())
plhdr = tf.get_default_graph().get_tensor_by_name('input:0')
activation = tf.get_default_graph().get_tensor_by_name('fc/BiasAdd:0')
actual = sess.run(activation, {plhdr: data})
assert np.allclose(actual, expected) is False
# now load the weights
loader = loader.restore(sess, '/tmp/my_model')
actual = sess.run(activation, {plhdr: data})
assert np.allclose(actual, expected) is True
I am new to TensorFlow. I have trained the inception_v3 model successfully with my training data; now I want to predict the output of several images, but the number of them is different from the batch_size in training. I did it as follows:
from tensorflow.contrib.slim.nets import inception_v3 as inception
checkpoint_dir =os.path.join('runs', configure_name, 'checkpoints')
checkpoint_file = tf.train.latest_checkpoint(checkpoint_dir)
graph = tf.Graph()
with graph.as_default():
session_conf = tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=False)
sess = tf.Session(config=session_conf)
with sess.as_default():
# Load the saved meta graph and restore variables
saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
saver.restore(sess, checkpoint_file)
x = tf.placeholder(tf.float32, [batch_size,input_size,input_size,num_channels], name='images')
_, end_points = inception.inception_v3(x,num_classes=num_classes, is_training=False)
outputs = end_points['Predictions']
scores = sess.run(outputs, feed_dict={x: x_eval})
predictions = np.argmax(scores,axis=1)
It gave me the errors as follows:
FailedPreconditionError: Attempting to use uninitialized value InceptionV3/Conv2d_1a_3x3/weights_1
It seems that the model parameters in "outputs" are not fed in successfully, but I do not know how to do it. Any ideas? Thanks.
Here you explicitly set the first dim of input placeholder x batch_size, so each time you need to feed a numpy.array type tensor with the same dim or your program will go wrong.
A solution can be setting the first dim of any placeholder(input and label) None so that this dim can be any int or different during training and validation
UPDATE:
if you have already trained your model with a fixed first dim placeholder input(and label), you can change it when restore this graph with tf.train.import_meta_graph(grapg_def=your_graph_def, input_map={'your_train_input_placedholer_name':new_placeholder})
here new_placeholder is a placeholder you newly create with unfixed first dim.