TensorFlow: Are saved variables from tf.saved_model and tf.train.Saver not compatible? - tensorflow

I saved a TensorFlow model using tf.saved_model and now I'm trying to load only the variables from that model using a tf.train.Saver, but I get one of the following two errors depending on the path I give it:
DataLossError: Unable to open table file saved_model/variables:
Failed precondition: saved_model/variables: perhaps your file is in a
different file format and you need to use a different restore operator?
or
InvalidArgumentError: Unsuccessful TensorSliceReader constructor:
Failed to get matching files on saved_model/variables/variables:
Not found: saved_model/variables
[[Node: save/RestoreV2_34 = RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/cpu:0"]
(_arg_save/Const_1_0_0,
save/RestoreV2_34/tensor_names, save/RestoreV2_34/shape_and_slices)]]
tf.saved_model, when saving a model, creates a saved_model.pb protocol buffer and a folder named variables that contains two files:
variables.data-00000-of-00001
variables.index
tf.train.Saver.save() creates the following files:
some_name.data-00000-of-00001
some_name.index
some_name.meta
checkpoint
I have always assumed that the two output files ending in .data-00000-of-00001 and .index are compatible between both savers.
Is that not the case?

Related

Restoring a checkpoint if ckpt.index file is missing

Is it possible to restore a checkpoint if ckpt.index file is missing, and only ckpt.data, meta and .pb (the frozen model corresponding to this checkpoint) files are available?
Context: I want to load the model from the checkpoint and resume training.
No, you need to have ckpt.index file as well.

Is it possible to load and train a model, if only thing we have is check point files?

Is it possible to load and train a model from check point files ?
We have information about input and output Tensor shape.
Check point files
Yes. You can use tensorflow-keras following this example.
https://www.tensorflow.org/guide/checkpoint
Directly from tensorflow documentation.
List checkpoints
!ls ./tf_ckpts
which produces
checkpoint ckpt-8.data-00000-of-00001 ckpt-9.index
ckpt-10.data-00000-of-00001 ckpt-8.index
ckpt-10.index ckpt-9.data-00000-of-00001
Recover from Checkpoint
Calling restore() on a tf.train.Checkpoint object queues the requested restorations, restoring variable values as soon as there's a matching path from the Checkpoint object. For example we can load just the bias from the model we defined above by reconstructing one path to it through the network and the layer.
to_restore = tf.Variable(tf.zeros([5])) # variables from your model.
print(to_restore.numpy()) # All zeros
fake_layer = tf.train.Checkpoint(bias=to_restore)
fake_net = tf.train.Checkpoint(l1=fake_layer)
new_root = tf.train.Checkpoint(net=fake_net)
status = new_root.restore(tf.train.latest_checkpoint('./tf_ckpts/'))
print(to_restore.numpy()) # We get the restored value now
To double check that it was restored you can type:
status.assert_existing_objects_matched()
and get the following output.
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f1d796da278>
Yes, it is possible if the checkpoint contains parameters of the model (parameters as W and b in W*x +b). I think you have that, in case of transfer learning, you can use this based on your files.
# Loads the weights
model.load_weights(checkpoint_path)
You should know the architecture of the model and create the model before using this. In some models, there is a specific way to load checkpoint.
Also, check this out: https://www.tensorflow.org/tutorials/keras/save_and_load

Using inception-v3 checkpoint file in tensorflow

In one of my project, I used a public pre-trained inception-v3 model available here : http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz.
I only want to use last feature vector (output of pool_3/_reshape:0). By looking at script example classify_image.py, I can successfully pass an image throught the Deep DNN, extract the bottleneck tensor (bottleneck_tensor = sess.graph.get_tensor_by_name('pool_3/_reshape:0')) and use it for further purpose.
I recently saw that there were a more recent trained inception model. Checkpoint of training is available here : http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz.
I would like to use this new pretrained instead of the old one. However file format is different. The "old model" uses a graph def in ProtocolBuffer form (classify_image_graph_def.pb) that is easily reusable. The "new one" only provides a checkpoint format, and I'm struggling to insert it into my code.
Is there an easy way to convert a checkpoint file to a ProtocolBuffer file that could be then used to create a graph?
It seems you have to use freeze_graph.py:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
The script converts checkpoint variables into Const ops in a standalone GraphDef file.
This script is designed to take a GraphDef proto, a SaverDef proto, and a set of variable values stored in a checkpoint file, and output a GraphDef with all of the variable ops converted into const ops containing the values of the
variables.
It's useful to do this when we need to load a single file in C++, especially in environments like mobile or embedded where we may not have access to the RestoreTensor ops and file loading calls that they rely on.
An example of command-line usage is:
bazel build tensorflow/python/tools:freeze_graph && \
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=some_graph_def.pb \
--input_checkpoint=model.ckpt-8361242 \
--output_graph=/tmp/frozen_graph.pb --output_node_names=softmax

Tensorflow can't save model

I encountered this weird problem...I use this code to construct tensorflow saver:
tf.train.Saver(tf.all_variables(), max_to_keep=FLAGS.keep)
which is supposed to be very standard. However, when I point the saving directory to my custom directory (under my username) instead of "/tmp", all of a sudden, the saved models are files like
translate.ckpt-329.data-00000-of-00001
translate.ckpt-329.index
translate.ckpt-329.meta
I can't find the file "translate.ckpt-329".
The generated checkpoint file is pointing to:
model_checkpoint_path: "/Users/.../train_dir/translate.ckpt-329"
all_model_checkpoint_paths: "/Users/.../train_dir/translate.ckpt-329"
while this file does not exist and create problems for me restoring my model.
Can someone shed any light on this?? What could possibly be the problem?
Thanks for the first answer! I guess my bigger problem is the restore method:
The original code uses this way to restore a session:
ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir)
model.saver.restore(session, ckpt.model_checkpoint_path)
Which failed with V2 saving :(
if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):
logging.info("Reading model parameters from %s" % ckpt.model_checkpoint_path)
model.saver.restore(session, ckpt.model_checkpoint_path)
else:
logging.info("Created model with fresh parameters.")
session.run(tf.global_variables_initializer())
TL;DR: In the new checkpoint format, the "filename" that you pass to the saver is actually used as the prefix of several filenames, and no file with that exact name is written. You can use the old checkpoint format by constructing your tf.train.Saver with the optional argument write_version=tf.train.SaverDef.V1.
From the names of the saved files, it appears that you are using the "V2" checkpoint format, which became the default in TensorFlow 0.12. This format stores the checkpoint data in multiple files: one or more data files (e.g. translate.ckpt-329.data-00000-of-00001 in your case) and an index file (translate.ckpt-329.index) that tells TensorFlow where each saved variable is located in the data files. The tf.train.Saver uses the "filename" that you pass as the prefix for these files' names, but doesn't produce a file with that exact name.
Although there is no file with the exact name you gave, you can use the value returned from saver.save() as the argument to a subsequent saver.restore(), and the other checkpoint locating mechanisms should continue to work as before.

Fail to read the new format of tensorflow checkpoint?

I pip installed tensorflow 0.12. I am able to resume training by loading old checkpoints which ends with .ckpk. However, tensorflow 0.12 dumps new checkpoints in a different format including *.index, .data-00000-of-00001 and *.meta. After that, I am not able to restore from the new checkpoint.
What is the proper way of loading the new format? Besides, how to read *index?
Mostly duplicate of How to restore a model by filename in Tensorflow r12?
Troubleshooting:
Read common suffix
-stop before the first dot after ckpt
Check model path
absolute
saver.restore(sess, "/full/path/to/model.ckpt")
or relative
saver.restore(sess, "./model.ckpt")
Regarding reading the .index file, as the name suggests, it is the first file to be opened by the restore function. No .index file, no restore (you could still restore without a .meta file).
The .index files needs the data-xxxx-of-xxxx shards, so it would be kind of pointless to read only the .index file, without any tensor data restored. What are you trying to achieve?