tf.train.Saver not loading in same process - tensorflow

I am observing a strange behavior where Saver can't restore if the checkpoint was saved earlier in the same Python process. It loads fine if done from a different process. Here's some simple code that will show the problem.
import tensorflow.compat.v1 as tf
def train():
W = tf.Variable(tf.zeros([1, 1]))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, "./model.ckpt")
def predict():
W = tf.Variable(tf.zeros([1, 1]))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, "./model.ckpt")
train()
predict()
Here we save and restore immediately after that in the same process. Restoration fails with errors like:
Key Variable_1 not found in checkpoint
But if I run just the predict() code again from a new Python process it works just fine.
#train()
predict()
Am I doing something wrong here?

After predict, if you run:
print([v for v in tf.trainable_variables()])
you will see that two different variables are being created. That's why TF is not able to restore the value of the second one.
In order to link both variables into a single one, you can either:
Pass a dictionary to the argument var_list of tf.train.Saver. For example:
saver = tf.train.Saver({'W': W})
Use auto-reusing when creating the variable. For example:
with tf.variable_scope('', reuse=tf.AUTO_REUSE):
W = tf.get_variable(initializer=lambda: tf.zeros([1, 1]),
name='W')

Related

Unable to export saved model using simple_save tensorflow

I am trying to use simple_save for tensorflow, but it isn't working :(
Here is my code:
def export_model(saved_model_dir, final_tensor_name):
with tf.Session() as sess:
with sess.graph.as_default() as graph:
tf.saved_model.simple_save(
sess,
saved_model_dir,
inputs={'image': tf.placeholder(tf.float32)},
outputs={'prediction': graph.get_tensor_by_name(final_tensor_name + ":0")}
)
I get the following error:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value final_training_ops/biases/final_biases
[[{{node save/SaveV2}}]]
I am working with the following tutorial: https://github.com/BartyzalRadek/Multi-label-Inception-net
I've spent so many hours trying to find solutions online and I know it can't be that tough. I already have a graph that is being exported and all I need now is that saved_model.pb. Any help is appreciated! Thank you!
NEW UPDATE - CODE BELOW
def export_model(saved_model_dir, final_tensor_name):
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
with sess.graph.as_default() as graph:
tf.saved_model.simple_save(
sess,
saved_model_dir,
inputs={'image': tf.placeholder(tf.string)},
outputs={'prediction': graph.get_tensor_by_name(final_tensor_name + ":0")}
)
The code runs now, but when I test the saved model, I always get the same result.
IMAGE_LABELING_CODE
import tensorflow as tf
import sys
image_path = sys.argv[1]
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
label_lines = [line.rstrip() for line
in tf.gfile.GFile("labels.txt")]
with tf.gfile.FastGFile("retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
As #giser_yugang said maybe you should put at the end of the construction part of the graph: init = tf.global_variables_initializer() and then at execution, after beginning the session: sess.run(init)
Nevertheless, if it were a local variable you would have to add the variable to some collection, establish the initializer and the run it. For example:
a = tf.Variable(..., collections=[tf.GRAPH_KEYS.LOCAL_VARIABLES])
local_init = tf.local_variable_initializer()
...
with tf.Session() as sess:
sess.run(local_init)
nevertheless, some implementations from tensorflow library go directly to local variables, for example, tf.metrics (if they have not changed this) and you just have to define and run local_init = tf.local_variables_initializer() and sess.run(local_init)

How to restore tensorflow graph in different notebook in Jupyter?

I'm trying to use tf.train.Saver() in order to save and restore a tensorflow graph. To save the graph I'm proceeding like follows:
saver = tf.train.Saver()
init_op = tf.global_variables_initializer()
sess.run(init_op)
with sess.as_default():
for ep in range(epoch):
train_step.run(feed_dict={X: X_train,
Y: y_train.reshape(-1,1)})
saver.save(sess, 'my_test_model')
To restore the model I'm doing like this:
with tf.Session() as sess:
saver.restore(sess, 'my_test_model')
test_accuracy = sess.run(x,feed_dict={X: X_valid,
Y: y_valid.reshape(-1,1)})
mse_test=loss.eval(feed_dict={X: X_valid,
Y: y_valid.reshape(-1,1)})
print(mse_test)
And while doing this in the same notebook everything works perfectly. But when I'm trying to do the same in another notebook the problem begins: the saver is not defined in the new notebook so I'm trying to define it again as tf.train.Saver() and get the error ValueError: No variables to save.
I tried also
saver = tf.train.import_meta_graph('my_test_model.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
but graph variables are not saved in the meta file.
Have anybody faced similar problem? I will appreciate any help.
Thanks in advance!

Why TensorFlow can not restore a variable initialized by a constant?

I am new to TensorFlow. When I read tensorflow saving and restoring variables manual, I encountered a problem. I saved a variable initialized by a constant, but I can not restore the variable. The code is as following:
a = tf.get_variable("name_a", initializer=[1,2,3])
op1 = a.assign(a+1)
saver = tf.train.Saver()
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
op1.op.run()
print(a.eval())
saver.save(sess,"log1/model.ckpt")
Then I restore it.
a = tf.get_variable("name_a", shape=[3])
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, "log1/model.ckpt")
print(a.eval())
I want to get output like [2,3,4], but I got [ 2.80259693e-45 4.20389539e-45 5.60519386e-45]. It's all zeros.
However, when I modify the first line in the first code snippet to
a = tf.get_variable("name_a", initializer=tf.zeros([3]))
I can get the right restored variable: [ 1. 1. 1.]
I wonder the reason for this situation.
I'm not 100% sure, but it looks like the reason is that your two variables:
tf.get_variable("name_a", initializer=[1,2,3])
tf.get_variable("name_a", shape=[3])
are not equivalent and can't be used one for another that easily (Update: the dtype is different, thanks #BlueSun for noticing this).
You will have a stable output if you define the tensors in restore code just like in saving: a = tf.get_variable("name_a", initializer=[1,2,3]). However, even better would be to work with the saved graph directly:
saver = tf.train.import_meta_graph('log1/model.ckpt.meta')
with tf.Session() as sess:
saver.restore(sess, "log1/model.ckpt")
saved = sess.graph.get_tensor_by_name('name_a:0')
print(sess.run(saved))
Which works correctly no matter how you define the initializer.
You have to define the variable a with the same data type. If you don't specify it and don't have any initializer, the dtype will be tf.float32 by default and the loading of tf.int32 will fail. Simply setting the data type to int32 will solve the problem:
a = tf.get_variable("name_a", shape=[3], dtype=tf.int32)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, "log1/model.ckpt")
print(a.eval())
Using a = tf.get_variable("name_a", initializer=tf.zeros([3])) worked because tf.zeros([3]) has the same dtype as [2, 3, 4]. It is safer to always set the dtype whenever you create a variables.

Tensorflow session and graph context

My question is about context and the TensorFlow default sessions and graph.
The problem:
Tensorflow is unable to feed a placeholder in the following scenario:
Function Test defines a graph.
Function Test_Once defines a session.
When Function Test calls Test_Once -> Feeding fails.
When I change the code so function Test declares the graph + the session -> all is working.
Here is the code:
def test_once(g, saver, summary_writer, logits, images, summary_op):
"""Run a session once for a givven test image.
Args:
saver: Saver.
summary_writer: Summary writer.
logits:
summary_op: Summary op.
"""
with tf.Session(graph=g) as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
images.astype(np.float32)
predictions = sess.run(logits, feed_dict={'InputPlaceHolder/TestInput:0':images})
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op))
summary_writer.add_summary(summary, global_step)
return (predictions)
def test():
"""Test LCPR with a test image"""
with tf.Graph().as_default() as g:
# Get image for testing
images, labels = lcpr.test_input()
# Build a Graph that computes the logits predictions from the
# inference model.
with tf.name_scope('InputPlaceHolder'):
test_image_placeholder = tf.placeholder(tf.float32, (None,None,None,3), 'TestInput')
# Display the training images in the visualizer.
# The 'max_outputs' default is 3. Not stated. (Max number of batch elements to generate images for.)
#tf.summary.image('input_images', test_image_placeholder)
with tf.name_scope('Inference'):
logits = lcpr.inference(test_image_placeholder)
# Restore the moving average version of the learned variables for eval.
variable_averages = tf.train.ExponentialMovingAverage(
lcpr.MOVING_AVERAGE_DECAY)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
# Build the summary operation based on the TF collection of Summaries.
writer = tf.summary.FileWriter("/tmp/lcpr/test")
writer.add_graph(g)
summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(FLAGS.test_dir, g)
#Sadly, this will not work:
predictions = test_once(g, saver, summary_writer, logits, images, summary_op)
'''Alternative working option :
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
# Restores from checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
# Assuming model_checkpoint_path looks something like:
# /my-favorite-path/cifar10_train/model.ckpt-0,
# extract global_step from it.
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
else:
print('No checkpoint file found')
return
x = sess.run(logits, feed_dict={'InputPlaceHolder/TestInput:0':images})
print(x)
'''
The above code yeilds an error that the placeholder is not fed:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'InputPlaceHolder/TestInput' with dtype float
And it's not that TensorFlow does not recognize the placeholder. If I change the name from 'InputPlaceHolder/TestInput:0' to 'InputPlaceHolder/TestInput:1' I receive a message calming that 'InputPlaceHolder/TestInput' exists but has only 1 output. This makes sense, and I guess the session runs on my default graph.
Things only work for me if I stay within the same def:
If I change the code by running the commented part (starting ' with tf.Session() as sess:) directly from within the first function all works.
I wonder what am I missing?
My guess that is context related, maybe not assigning the session to the graph?
Solved. Stupid mistake
test_once calls sess.run twice. On the second time, indeed no placeholder is fed.... : summary.ParseFromString(sess.run(summary_op))

Unsuccessful TensorSliceReader constructor: Failed to find any matching files for

I tried saving my model out and then tried to restore it, but seems tensorflow is unable to find the location of the matching files :-
Code to save model output :-
import tensorflow as tf
save_file = 'model.ckpt'
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.save(sess, save_file)
Code to restore model
import tensorflow as tf
save_file = 'model.ckpt'
tf.reset_default_graph()
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, 'model.ckpt')
I am getting below errors :-
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model.ckpt
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model.ckpt
The saver.restore() method will fail unless you pass a path—and not just a filename—as the second argument. To work around this problem, you can call saver.restore(sess, './model.ckpt') if you are running the script from the directory containing the checkpoint.