With tensorflow 1.2.0, I am trying to restore a saved model but I receive the error:
DataLossError (see above for traceback): Unable to open table file checkpoints/saved_2/saved_2_model_1.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2_185 = RestoreV2[dtypes=[DT_INT32], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_185/tensor_names, save/RestoreV2_185/shape_and_slices)]]
I am using the same tensorflow version for saving and restoring.
For saving:
saver = tf.train.Saver()
ckpt_dir = os.path.join(params['CHK_PATH'], folder)
if not os.path.exists(ckpt_dir):
os.makedirs(ckpt_dir)
ckpt_file = os.path.join(ckpt_dir, '{}'.format(name))
path = saver.save(sess, ckpt_file)
For restoring:
saver.restore(sess, ckpt_file)
I tried: model_saver = tf.train.Saver(write_version = saver_pb2.SaverDef.V1)
But the same problem remains.
saver.restore(sess,tf.train.latest_checkpoint(ckpt_dir))
works
Related
I trained a model using MonitoredTrainingSession() with a checkpoint saver hook tf.train.CheckpointSaverHook() saving checkpoints every 1000 steps. After training the following files were created in the checkpoint directory:
events.out.tfevents.1511969396.cmle-training-master-ef2237c814-0-xn7pp
graph.pbtxt
model.ckpt-1.meta
model.ckpt-1001.meta
model.ckpt-2001.meta
model.ckpt-3001.meta
model.ckpt-4001.meta
model.ckpt-4119.meta
I want to restore the checkpoint but can't, here is my code (assuming the files above are in the directory checkpoints):
tf.train.import_meta_graph('checkpoints/model.ckpt-4139.meta')
saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state('./checkpoints/')
saver.restore(sess, ckpt.model_checkpoint_path)
The problem is ckpt is None, I think I might be missing a file... What I am doing wrong.
This is how I save the checkpoints:
hooks=lists()
hooks.append(tf.train.CheckpointSaverHook(checkpoint_dir=checkpoint_dir, save_steps=checkpoint_iterations)
with tf.Graph().as_default():
with tf.device(tf.train.replica_device_setter()):
batch = model.input_fn(train_path, batch_size, epochs, 'train_queue')
tensors = model.model_fn(batch, content_weight, style_weight, tv_weight, vgg_path, style_features,
batch_size, learning_rate)
with tf.train.MonitoredTrainingSession(master=target,
is_chief=is_chief,
checkpoint_dir=job_dir,
hooks=hooks,
save_checkpoint_secs=None,
save_summaries_steps=None,
log_step_count_steps=10) as sess:
_ = sess.run(tensors)
(...)
Restoring the full checkpoint
tf.train.get_checkpoint_state checks the checkpoint (no extension) file inside the directory you pass as parameter.
This file has usually a content similar to:
model_checkpoint_path: "model.ckpt-1"
all_model_checkpoint_paths: "model.ckpt-1"
If this file is missing, the function will return None.
Add a text file with that name and content to your model folder and you'll be able to restore using the code you already have.
Very important note: To restore this way you need all the checkpoint data, i.e., the three files: .data-*, .meta and .index.
Restoring just the graph
If, however, you're interested in restoring only the meta-graph, you can do so via import_meta_graph() as detailed in the official TF guide.
Note (from the definition of import_meta_graph()):
This function takes a MetaGraphDef protocol buffer as input. If the
argument is a file containing a MetaGraphDef protocol buffer , it
constructs a protocol buffer from the file content. The function then
adds all the nodes from the graph_def field to the current graph,
recreates all the collections, and returns a saver constructed from
the saver_def field.
Using that saver won't work unless you have the .index and .data-* files in the same directory.
I'm trying to generate a pb file using the method given in this tutorial,
http://cv-tricks.com/how-to/freeze-tensorflow-models/
import tensorflow as tf
saver = tf.train.import_meta_graph('/Users/pr/tensorflow/dogs-cats-model.meta', clear_devices=True)
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()
sess = tf.Session()
saver.restore(sess, "./dogs-cats-model")
When I try to run this code I get this error -
DataLossError (see above for traceback): Unable to open table file ./dogs-cats-model: Data loss: file is too short to be an sstable: perhaps your file is in a different file format and you need to use a different restore operator?
WHen I googled this error most of them recommend to generate the meta file using version 2 format? Is that the right approach?
Tensorflow version used -
1.3.0
Apparently, you are using both '/Users/pr/tensorflow/dogs-cats-model.meta' and './dogs-cats-model.meta'. Are you sure they point to the same file?
The following code works well on my machine:
import tensorflow as tf
saver = tf.train.import_meta_graph('./dogs-cats-model.meta', clear_devices=True)
graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()
sess = tf.Session()
saver.restore(sess, "./dogs-cats-model")
After upgrading Tensorflow to r1.0, the restore command does not seem to work.
For example, can anyone tell me what is wrong with the following?
def foo():
v1 = tf.Variable(1., name="v1")
v2 = tf.Variable(2., name="v2")
v3 = v1 + v2
saver = tf.train.Saver()
with tf.Session() as sess:
tf.global_variables_initializer().run()
saver.save(sess, "temp")
# do something
saver.restore(sess, "temp")
From the last line, I got an error:
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for temp
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Tensorflow documentation still holds the explanation of old versions for this matter.
TensorFlow 1.0 has a bug where it doesn't recognize tf.Saver.restore() filenames that contain only a filename (and no path component). This will be fixed in the next version, but for now you should be able to use the following workaround to add a path component:
saver.restore(sess, "./temp")
I am trying to import a saved neural network in Tensorflow. I saved it after training with:
saver = tf.train.Saver()
saver.save(sess, filename)
and in the script I use for inference, I restore it with:
sess = tf.Session()
saver = tf.train.import_meta_graph(filename.meta)
saver.restore(sess, tf.train.latest_checkpoint('./'))
But during the import_meta_graph line, I get this error:
KeyError: "The name 'dropout1/cond/dropout/Shape/Switch:1' refers to a Tensor which does not exist. The operation, 'dropout1/cond/dropout/Shape/Switch', does not exist in the graph."
I looked at the names of the tensors and operations in the original notebook in which I trained the model, and the names mentionned in the error message do exist. Moreover, I used the same code for saving and importing other models and it works. The only difference is that I trained these on an AWS machine, with an older version of tensorflow, while I trained the problematic one on my computer.
I have run the distributed mnist example:
https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/tools/dist_test/python/mnist_replica.py
Though I have set the
saver = tf.train.Saver(max_to_keep=0)
In previous release, like r11, I was able to run over each check point model and evaluate the precision of the model. This gave me a plot of the progress of the precision versus global steps (or iterations).
Prior to r12, tensorflow checkpoint models were saved in two files, model.ckpt-1234 and model-ckpt-1234.meta. One could restore a model by passing the model.ckpt-1234 filename like so saver.restore(sess,'model.ckpt-1234').
However, I've noticed that in r12, there are now three output files model.ckpt-1234.data-00000-of-000001, model.ckpt-1234.index, and model.ckpt-1234.meta.
I see that the the restore documentation says that a path such as /train/path/model.ckpt should be given to restore instead of a filename. Is there any way to load one checkpoint file at a time to evaluate it? I have tried passing the model.ckpt-1234.data-00000-of-000001, model.ckpt-1234.index, and model.ckpt-1234.meta files, but get errors like below:
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
NotFoundError (see above for traceback): Tensor name "hid_b" not found in checkpoint files logdir/2016-12-08-13-54/model.ckpt-0.index
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I'm running on OSX Sierra with tensorflow r12 installed via pip.
Any guidance would be helpful.
Thank you.
I also used Tensorlfow r0.12 and I didn't think there is any issue for saving and restoring model. The following is a simple code that you can have a try:
import tensorflow as tf
# Create some variables.
v1 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
v2 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v2")
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in file: %s" % save_path)
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Do some work with the model
although in r0.12, the checkpoint is stored in multiple files, you can restore it by using the common prefix, which is 'model.ckpt' in your case.
The R12 has changed the checkpoint format. You should save the model in the old format.
import tensorflow as tf
from tensorflow.core.protobuf import saver_pb2
...
saver = tf.train.Saver(write_version = saver_pb2.SaverDef.V1)
saver.save(sess, './model.ckpt', global_step = step)
According to the TensorFlow v0.12.0 RC0’s release note:
New checkpoint format becomes the default in tf.train.Saver. Old V1
checkpoints continue to be readable; controlled by the write_version
argument, tf.train.Saver now by default writes out in the new V2
format. It significantly reduces the peak memory required and latency
incurred during restore.
see details in my blog.
You can restore the model like this:
saver = tf.train.import_meta_graph('./src/models/20170512-110547/model-20170512-110547.meta')
saver.restore(sess,'./src/models/20170512-110547/model-20170512-110547.ckpt-250000'))
Where the path '/src/models/20170512-110547/' contains three files:
model-20170512-110547.meta
model-20170512-110547.ckpt-250000.index
model-20170512-110547.ckpt-250000.data-00000-of-00001
And if in one directory there are more than one checkpoints,eg: there are checkpoint files in the path
./20170807-231648/:
checkpoint
model-20170807-231648-0.data-00000-of-00001
model-20170807-231648-0.index
model-20170807-231648-0.meta
model-20170807-231648-100000.data-00000-of-00001
model-20170807-231648-100000.index
model-20170807-231648-100000.meta
you can see that there are two checkpoints, so you can use this:
saver = tf.train.import_meta_graph('/home/tools/Tools/raoqiang/facenet/models/facenet/20170807-231648/model-20170807-231648-0.meta')
saver.restore(sess,tf.train.latest_checkpoint('/home/tools/Tools/raoqiang/facenet/models/facenet/20170807-231648/'))
OK, I can answer my own question. What I found was that my python script was adding an extra '/' to my path so I was executing:
saver.restore(sess,'/path/to/train//model.ckpt-1234')
somehow that was causing a problem with tensorflow.
When I removed it, calling:
saver.restore(sess,'/path/to/trian/model.ckpt-1234')
it worked as expected.
use only model.ckpt-1234
at least it works for me
I'm new to TF and met the same issue. After reading Yuan Ma's comments, I copied the '.index' to the same 'train\ckpt' folder together with '.data-00000-of-00001' file. Then it worked!
So, the .index file is sufficient when restoring the models.
I used TF on Win7, r12.