I trained a model using MonitoredTrainingSession() with a checkpoint saver hook tf.train.CheckpointSaverHook() saving checkpoints every 1000 steps. After training the following files were created in the checkpoint directory:
events.out.tfevents.1511969396.cmle-training-master-ef2237c814-0-xn7pp
graph.pbtxt
model.ckpt-1.meta
model.ckpt-1001.meta
model.ckpt-2001.meta
model.ckpt-3001.meta
model.ckpt-4001.meta
model.ckpt-4119.meta
I want to restore the checkpoint but can't, here is my code (assuming the files above are in the directory checkpoints):
tf.train.import_meta_graph('checkpoints/model.ckpt-4139.meta')
saver = tf.train.Saver()
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state('./checkpoints/')
saver.restore(sess, ckpt.model_checkpoint_path)
The problem is ckpt is None, I think I might be missing a file... What I am doing wrong.
This is how I save the checkpoints:
hooks=lists()
hooks.append(tf.train.CheckpointSaverHook(checkpoint_dir=checkpoint_dir, save_steps=checkpoint_iterations)
with tf.Graph().as_default():
with tf.device(tf.train.replica_device_setter()):
batch = model.input_fn(train_path, batch_size, epochs, 'train_queue')
tensors = model.model_fn(batch, content_weight, style_weight, tv_weight, vgg_path, style_features,
batch_size, learning_rate)
with tf.train.MonitoredTrainingSession(master=target,
is_chief=is_chief,
checkpoint_dir=job_dir,
hooks=hooks,
save_checkpoint_secs=None,
save_summaries_steps=None,
log_step_count_steps=10) as sess:
_ = sess.run(tensors)
(...)
Restoring the full checkpoint
tf.train.get_checkpoint_state checks the checkpoint (no extension) file inside the directory you pass as parameter.
This file has usually a content similar to:
model_checkpoint_path: "model.ckpt-1"
all_model_checkpoint_paths: "model.ckpt-1"
If this file is missing, the function will return None.
Add a text file with that name and content to your model folder and you'll be able to restore using the code you already have.
Very important note: To restore this way you need all the checkpoint data, i.e., the three files: .data-*, .meta and .index.
Restoring just the graph
If, however, you're interested in restoring only the meta-graph, you can do so via import_meta_graph() as detailed in the official TF guide.
Note (from the definition of import_meta_graph()):
This function takes a MetaGraphDef protocol buffer as input. If the
argument is a file containing a MetaGraphDef protocol buffer , it
constructs a protocol buffer from the file content. The function then
adds all the nodes from the graph_def field to the current graph,
recreates all the collections, and returns a saver constructed from
the saver_def field.
Using that saver won't work unless you have the .index and .data-* files in the same directory.
Related
I have a simple pb file, without any ckpt file. I would like to (randomly)initialize all the weights of the pb file and save the the initialized weights as ckpt file. I could not find any way to do it. global variable initializer just threw no variables to save
The Model can be saved in .pb format in three ways. Using tf.saved_model.simple_save, tf.saved_model.Builder.save or estimator.export_savedmodel.
To answer your question, you can restore the .pb model using the function, tf.saved_model.loader.load(sess,[tag_constants.TRAINING],Export_Dir).
Then, you can restore the Weights (you should remember the name corresponding to Weights Tensor) using the below code:
with graph2.as_default():
with tf.Session(graph=graph2) as sess:
# Restore saved values
print('\nRestoring...')
tf.saved_model.loader.load(sess,[tag_constants.SERVING],path)
#Specify the correct `Weights_Tensor_Name`
Weights_Var = graph2.get_tensor_by_name('Weights_Tensor_Name:0')
After that, you can save the Weights to .checkpoint files using the below code:
saver = tf.train.Saver({"Weights_Var": Weights_Var})
But if you just want to save a Random Tensor (with name Weights) into Check Points File, you can generate a Random Tensor using the function, tf.random.uniform and then can save it using the code below:
saver = tf.train.Saver({"Weights_Var": Weights_Var})
Having read the docs, I saved a model in TensorFlow, here is my demo code:
# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
..
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in file: %s" % save_path)
but after that, I found there are 3 files
model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta
And I can't restore the model by restore the model.ckpt file, since there is no such file. Here is my code
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
So, why there are 3 files?
Try this:
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/tmp/model.ckpt.meta')
saver.restore(sess, "/tmp/model.ckpt")
The TensorFlow save method saves three kinds of files because it stores the graph structure separately from the variable values. The .meta file describes the saved graph structure, so you need to import it before restoring the checkpoint (otherwise it doesn't know what variables the saved checkpoint values correspond to).
Alternatively, you could do this:
# Recreate the EXACT SAME variables
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Now load the checkpoint variable values
with tf.Session() as sess:
saver = tf.train.Saver()
saver.restore(sess, "/tmp/model.ckpt")
Even though there is no file named model.ckpt, you still refer to the saved checkpoint by that name when restoring it. From the saver.py source code:
Users only need to interact with the user-specified prefix... instead
of any physical pathname.
meta file: describes the saved graph structure, includes GraphDef, SaverDef, and so on; then apply tf.train.import_meta_graph('/tmp/model.ckpt.meta'), will restore Saver and Graph.
index file: it is a string-string immutable table(tensorflow::table::Table). Each key is a name of a tensor and its value is a serialized BundleEntryProto. Each BundleEntryProto describes the metadata of a tensor: which of the "data" files contains the content of a tensor, the offset into that file, checksum, some auxiliary data, etc.
data file: it is TensorBundle collection, save the values of all variables.
I am restoring trained word embeddings from Word2Vec tensorflow tutorial.
In case you have created multiple checkpoints:
e.g. files created look like this
model.ckpt-55695.data-00000-of-00001
model.ckpt-55695.index
model.ckpt-55695.meta
try this
def restore_session(self, session):
saver = tf.train.import_meta_graph('./tmp/model.ckpt-55695.meta')
saver.restore(session, './tmp/model.ckpt-55695')
when calling restore_session():
def test_word2vec():
opts = Options()
with tf.Graph().as_default(), tf.Session() as session:
with tf.device("/cpu:0"):
model = Word2Vec(opts, session)
model.restore_session(session)
model.get_embedding("assistance")
If you trained a CNN with dropout, for example, you could do this:
def predict(image, model_name):
"""
image -> single image, (width, height, channels)
model_name -> model file that was saved without any extensions
"""
with tf.Session() as sess:
saver = tf.train.import_meta_graph('./' + model_name + '.meta')
saver.restore(sess, './' + model_name)
# Substitute 'logits' with your model
prediction = tf.argmax(logits, 1)
# 'x' is what you defined it to be. In my case it is a batch of RGB images, that's why I add the extra dimension
return prediction.eval(feed_dict={x: image[np.newaxis,:,:,:], keep_prob_dnn: 1.0})
I have run the distributed mnist example:
https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/tools/dist_test/python/mnist_replica.py
Though I have set the
saver = tf.train.Saver(max_to_keep=0)
In previous release, like r11, I was able to run over each check point model and evaluate the precision of the model. This gave me a plot of the progress of the precision versus global steps (or iterations).
Prior to r12, tensorflow checkpoint models were saved in two files, model.ckpt-1234 and model-ckpt-1234.meta. One could restore a model by passing the model.ckpt-1234 filename like so saver.restore(sess,'model.ckpt-1234').
However, I've noticed that in r12, there are now three output files model.ckpt-1234.data-00000-of-000001, model.ckpt-1234.index, and model.ckpt-1234.meta.
I see that the the restore documentation says that a path such as /train/path/model.ckpt should be given to restore instead of a filename. Is there any way to load one checkpoint file at a time to evaluate it? I have tried passing the model.ckpt-1234.data-00000-of-000001, model.ckpt-1234.index, and model.ckpt-1234.meta files, but get errors like below:
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
NotFoundError (see above for traceback): Tensor name "hid_b" not found in checkpoint files logdir/2016-12-08-13-54/model.ckpt-0.index
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I'm running on OSX Sierra with tensorflow r12 installed via pip.
Any guidance would be helpful.
Thank you.
I also used Tensorlfow r0.12 and I didn't think there is any issue for saving and restoring model. The following is a simple code that you can have a try:
import tensorflow as tf
# Create some variables.
v1 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
v2 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v2")
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in file: %s" % save_path)
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Do some work with the model
although in r0.12, the checkpoint is stored in multiple files, you can restore it by using the common prefix, which is 'model.ckpt' in your case.
The R12 has changed the checkpoint format. You should save the model in the old format.
import tensorflow as tf
from tensorflow.core.protobuf import saver_pb2
...
saver = tf.train.Saver(write_version = saver_pb2.SaverDef.V1)
saver.save(sess, './model.ckpt', global_step = step)
According to the TensorFlow v0.12.0 RC0’s release note:
New checkpoint format becomes the default in tf.train.Saver. Old V1
checkpoints continue to be readable; controlled by the write_version
argument, tf.train.Saver now by default writes out in the new V2
format. It significantly reduces the peak memory required and latency
incurred during restore.
see details in my blog.
You can restore the model like this:
saver = tf.train.import_meta_graph('./src/models/20170512-110547/model-20170512-110547.meta')
saver.restore(sess,'./src/models/20170512-110547/model-20170512-110547.ckpt-250000'))
Where the path '/src/models/20170512-110547/' contains three files:
model-20170512-110547.meta
model-20170512-110547.ckpt-250000.index
model-20170512-110547.ckpt-250000.data-00000-of-00001
And if in one directory there are more than one checkpoints,eg: there are checkpoint files in the path
./20170807-231648/:
checkpoint
model-20170807-231648-0.data-00000-of-00001
model-20170807-231648-0.index
model-20170807-231648-0.meta
model-20170807-231648-100000.data-00000-of-00001
model-20170807-231648-100000.index
model-20170807-231648-100000.meta
you can see that there are two checkpoints, so you can use this:
saver = tf.train.import_meta_graph('/home/tools/Tools/raoqiang/facenet/models/facenet/20170807-231648/model-20170807-231648-0.meta')
saver.restore(sess,tf.train.latest_checkpoint('/home/tools/Tools/raoqiang/facenet/models/facenet/20170807-231648/'))
OK, I can answer my own question. What I found was that my python script was adding an extra '/' to my path so I was executing:
saver.restore(sess,'/path/to/train//model.ckpt-1234')
somehow that was causing a problem with tensorflow.
When I removed it, calling:
saver.restore(sess,'/path/to/trian/model.ckpt-1234')
it worked as expected.
use only model.ckpt-1234
at least it works for me
I'm new to TF and met the same issue. After reading Yuan Ma's comments, I copied the '.index' to the same 'train\ckpt' folder together with '.data-00000-of-00001' file. Then it worked!
So, the .index file is sufficient when restoring the models.
I used TF on Win7, r12.
I trained a LSTM classifier, using a BasicLSTMCell. How can I save my model and restore it for use in later classifications?
We found the same issue. We weren't sure if the internal variables were saved. We found out that you must create the saver after the BasicLSTMCell is created /defined. Otherewise it is not saved.
The easiest way to save and restore a model is to use a tf.train.Saverobject. The constructor adds save and restore ops to the graph for all, or a specified list, of the variables in the graph. The saver object provides methods to run these ops, specifying paths for the checkpoint files to write to or read from.
Refer to:
https://www.tensorflow.org/versions/r0.11/how_tos/variables/index.html
Checkpoint Files
Variables are saved in binary files that, roughly, contain a map from variable names to tensor values.
When you create a Saver object, you can optionally choose names for the variables in the checkpoint files. By default, it uses the value of the Variable.name property for each variable.
To understand what variables are in a checkpoint, you can use the inspect_checkpoint library, and in particular, the print_tensors_in_checkpoint_file function.
Saving Variables
Create a Saver with tf.train.Saver() to manage all variables in the model.
# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
..
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in file: %s" % save_path)
Restoring Variables
The same Saver object is used to restore variables. Note that when you restore variables from a file you do not have to initialize them beforehand.
# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Do some work with the model
...
I was wondering this myself. As other pointed out, the usual way to save a model in TensorFlow is to use tf.train.Saver(), however I believe this saves the values of tf.Variables.
I'm not exactly sure if there are tf.Variables inside the BasicLSTMCell implementation which are saved automatically when you do this, or if there is perhaps another step that need to be taken, but if all else fails, the BasicLSTMCell can be easily saved and loaded in a pickle file.
Yes, there are weight and bias variables inside the LSTM cell (indeed, all neural network cells have to have weight vars somewhere). as already noted in other answers, using the Saver object appears to be the way to go... saves your variables and your (meta)graph in a reasonably convenient way. You'll need the metagraph if you want to get the whole model back, not just some tf.Variables sitting there in isolation. It does need to know all the variables it has to save, so create the saver after creating the graph.
A useful little trick when dealing with any "is there variables?"/"is it properly reusing weights?"/"how can I actually look at the weights in my LSTM, which isn't bound to any python var?"/etc. situation is this little snippet:
for i in tf.global_variables():
print(i)
for vars and
for i in my_graph.get_operations():
print (i)
for ops. If you want to view a tensor that isn't bound to a python var,
tf.Graph.get_tensor_by_name('name_of_op:N')
where name of op is the name of the operation that generates the tensor, and N is an index of which (of possibly several) output tensors you're after.
tensorboard's graph display can be helpful for finding op names if your graph has a ton of operations...which most tend to...
I've made example code for LSTM save and restore.
I also took a lot of time to solve this.
Refer to this url : https://github.com/MareArts/rnn_save_restore_test
I hope to help this code.
You can instantiate a tf.train.Saver object and call save passing the current session and output checkpoint file (*.ckpt) path during training. You can call save whenever you think is appropriate (e.g. every few epochs, when validation error drops):
# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
..
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in file: %s" % save_path)
During classification/inference you instantiate another tf.train.Saver and call restore passing the current session and the checkpoint file to restore. You can call restore just before you use your model for classification by calling session.run:
# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Do some work with the model
...
Reference: https://www.tensorflow.org/versions/r0.11/how_tos/variables/index.html#saving-and-restoring
I am a bit of a beginner with tensorflow so please excuse if this is a stupid question and the answer is obvious.
I have created a Tensorflow graph where starting with placeholders for X and y I have optimized some tensors which represent my model. Part of the graph is something where a vector of predictions can be calculated, e.g. for linear regression something like
y_model = tf.add(tf.mul(X,w),d)
y_vals = sess.run(y_model,feed_dict={....})
After training has been completed I have acceptable values for w and d and now I want to save my model for later. Then, in a different python session I want to restore the model so that I can again run
## Starting brand new python session
import tensorflow as tf
## somehow restor the graph and the values here: how????
## so that I can run this:
y_vals = sess.run(y_model,feed_dict={....})
for some different data and get back the y-values.
I want this to work in a way where the graph for calculating the y-values from the placeholders is also stored and restored - as long as the placeholders get fed the correct data, this should work transparently without the user (the one who applies the model) needing to know what the graph looks like).
As far as I understand tf.train.Saver().save(..) only saves the variables but I also want to save the graph. I think that tf.train.export_meta_graph could be relevant here but I do not understand how to use it correctly, the documentation is a bit cryptic to me and the examples do not even use export_meta_graph anywhere.
From the docs, try this:
# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
..
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in file: %s" % save_path)
You can specify the path.
And if you want to restore the model, try:
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/tmp/model.ckpt.meta')
saver.restore(sess, "/tmp/model.ckpt")
Saving Graph in Tensorflow:
import tensorflow as tf
# Create some placeholder variables
x_pl = tf.placeholder(..., name="x")
y_pl = tf.placeholder(..., name="y")
# Add some operation to the Graph
add_op = tf.add(x, y)
with tf.Session() as sess:
# Add variable initializer
init = tf.global_variables_initializer()
# Add ops to save variables to checkpoints
# Unless var_list is specified Saver will save ALL named variables
# in Graph
# Optionally set maximum of 3 latest models to be saved
saver = tf.train.Saver(max_to_keep=3)
# Run variable initializer
sess.run(init)
for i in range(no_steps):
# Feed placeholders with some data and run operation
sess.run(add_op, feed_dict={x_pl: i+1, y_pl: i+5})
saver.save(sess, "path/to/checkpoint/model.ckpt", global_step=i)
This will save the following files:
1) Meta Graph
.meta file:
MetaGraphDef protocol buffer representation of MetaGraph which saves the complete Tf Graph structure i.e. the GraphDef that describes the dataflow and all metadata associated with it e.g. all variables, operations, collections, etc.
importing the graph structure will recreate the Graph and all its variables, then the corresponding values for these variables can be restored from the checkpoint file
if you don't want to restore the Graph however you can reconstruct all of the information in the MetaGraphDef by re-executing the Python code that builds the model n.b. you must recreate the EXACT SAME variables first before restoring their values from the checkpoint
since Meta Graph file is not always needed, you can switch off writing the file in saver.save using write_meta_graph=False
2) Checkpoint files
.data file:
binary file containing VALUES of all saved variables outlined in tf.train.Saver() (default is all variables)
.index file:
immutable table describing all tensors and their metadata checkpoint file:
keeps a record of latest checkpoint files saved
Restoring Graph in Tensorflow:
import tensorflow as tf
latest_checkpoint = tf.train.latest_checkpoint("path/to/checkpoint")
# Load latest checkpoint Graph via import_meta_graph:
# - construct protocol buffer from file content
# - add all nodes to current graph and recreate collections
# - return Saver
saver = tf.train.import_meta_graph(latest_checkpoint + '.meta')
# Start session
with tf.Session() as sess:
# Restore previously trained variables from disk
print("Restoring Model: {}".format("path/to/checkpoint"))
saver.restore(sess, latest_checkpoint)
# Retrieve protobuf graph definition
graph = tf.get_default_graph()
print("Restored Operations from MetaGraph:")
for op in graph.get_operations():
print(op.name)
# Access restored placeholder variables
x_pl = graph.get_tensor_by_name("x_pl:0")
y_pl = graph.get_tensor_by_name("y_pl:0")
# Access restored operation to re run
accuracy_op = graph.get_tensor_by_name("accuracy_op:0")
This is just a quick example with the basics, for a working implementation see here.
In order to save the graph, you need to freeze the graph.
Here is the python script for freezing the graph : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
Here is a code snippet for freezing graph:
from tensorflow.python.tools import freeze_graph
freeze_graph.freeze_graph(input_graph_path, input_saver_def_path,
input_binary, checkpoint_path, output_node
restore_op_name, filename_tensor_name,
output_frozen_graph_name, True, "")
where output node corresponds to output tensor variable.
output = tf.nn.softmax(outer_layer_name,name="output")