Save and load Tensorflow model - tensorflow

I want to save a Tensorflow (0.12.0) model, including graph and variable values, then later load and execute it. I have the read the docs and other posts on this but cannot get the basics to work. I am using the technique from this page in the Tensorflow docs. Code:
Save a simple model:
myVar = tf.Variable(7.1)
tf.add_to_collection('modelVariables', myVar) # why?
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print sess.run(myVar)
saver0 = tf.train.Saver()
saver0.save(sess, './myModel.ckpt')
saver0.export_meta_graph('./myModel.meta')
Later, load and execute the model:
with tf.Session() as sess:
saver1 = tf.train.import_meta_graph('./myModel.meta')
saver1.restore(sess, './myModel.meta')
print sess.run(myVar)
Question 1: The saving code seems to work but the loading code produces this error:
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open ./myModel.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
How to fix this?.
Question 2: I included this line to follow the pattern in the TF docs...
tf.add_to_collection('modelVariables', myVar)
... but why is that line necessary? Doesn't expert_meta_graphexport the entire graph by default? If not then does one need to add every variable in the graph to the collection before saving? Or do we just add to the collection those variables that will be accessed after the restore?
---------------------- Update January 12 2017 -----------------------------
Partial success based on Kashyap's suggestion below but a mystery still exists. The code below works but only if I include the lines containing tf.add_to_collection and tf.get_collection. Without those lines, 'load' mode throws an error in the last line:
NameError: name 'myVar' is not defined. My understanding was that by default Saver.save saves and restores all variables in the graph, so why is it necessary to specify the name of variables that will be used in the collection? I assume this has to do with mapping Tensorflow's variable names to Python names, but what are the rules of the game here? For which variables does this need to be done?
mode = 'load' # or 'save'
if mode == 'save':
myVar = tf.Variable(7.1)
init_op = tf.global_variables_initializer()
saver0 = tf.train.Saver()
tf.add_to_collection('myVar', myVar) ### WHY NECESSARY?
with tf.Session() as sess:
sess.run(init_op)
print sess.run(myVar)
saver0.save(sess, './myModel')
if mode == 'load':
with tf.Session() as sess:
saver1 = tf.train.import_meta_graph('./myModel.meta')
saver1.restore(sess, tf.train.latest_checkpoint('./'))
myVar = tf.get_collection('myVar')[0] ### WHY NECESSARY?
print sess.run(myVar)

Question1
This question has been already answered thoroughly here. You don't have to explicitly call export_meta_graph. Call the save method. This will generate the .meta file also (since save method will call the export_meta_graph method internally.)
For example
saver0.save(sess, './myModel.ckpt')
will produce myModel.ckpt file and also the myModel.ckpt.meta file.
Then you can restore the model using
with tf.Session() as sess:
saver1 = tf.train.import_meta_graph('./myModel.ckpt.meta')
saver1.restore(sess, './myModel')
print sess.run(myVar)
Question2
Collections are used to store custom information like learning rate,the regularisation factor that you have used and other information and these will be stored when you export the graph. Tensorflow itself defines some collections like "TRAINABLE_VARIABLES" which are used to get all the trainable variables of the model you built. You can chose to export all the collections in your graph or you can specify which collections to export in the export_meta_graph function.
Yes tensorflow will export all the variables that you define. But if you need any other information that needs to be exported to the graph then they can be added to the collection.

I've been trying to figure out the same thing and was able to successfully do it by using Supervisor. It automatically loads all variables and your graph etc. Here is the documentation - https://www.tensorflow.org/programmers_guide/supervisor. Below is my code -
sv = tf.train.Supervisor(logdir="/checkpoint', save_model_secs=60)
with sv.managed_session() as sess:
if not sv.should_stop():
#Do run/eval/train ops on sess as needed. Above works for both saving and loading
As you see, this is much simpler than using the Saver object and dealing with individual variables etc as long as the graph stays the same (my understanding is that Saver comes handy when we want to reuse a pre-trained model for a different graph).

Related

How to restore a tensorflow model that only has one file with extension ".model"

I want to use a pretrained tensorflow model provided by an unknown author. I do not know how he/she managed to save the tensorflow model (he/she used tensorflow version >= 1.2) to only one file with the extension '.model', as normally I get either three files '.meta', '.data', '.index' or one file with '.ckpt'.
How can I restore this pretrained model? How can I save a model to this format later?
Thanks.
I have also asked this question on a number of platforms with no assistance yet. So I decided to do some experimental work and this is what I found. This may be long but please bear with me.
To import a model in Tensor-flow we use
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('my_test_model-1000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
The .meta file contains all the variables, operations, collections, etc, of the trained model. What tf.train.latest_checkpoint('./') does is to use the checkpoint file (which simply keeps a record of latest checkpoint files saved) to import the xxxx_model.data-00000-of-00001. This .data-00000-of-00001 contains all the weights, biases, gradients, etc, that must be loaded into the variables contained in my_test_model-1000.meta.
Summary [Semi-complete code]
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('my_test_model-1000.meta')
#new_saver.restore(sess, tf.train.latest_checkpoint('./'))
tensor_variable = tf.trainable_variables()
for tensor_var in tensor_variable:
#print(sess.run(tensor_var))
print(tensor_var)
This initial code will print out all the variables from .meta that are trainable. If you try to run print(sess.run(tensor_var)) you will get an error. This is because, the variables have not been initialized. How ever, if you un-comment new_saver.restore(sess, tf.train.latest_checkpoint('./')) and run print(sess.run(tensor_var)), you will get all the variables alongside values loaded into the variables.
Now to “.model”
My best guess is that xxxxxx.model works a much like xxxx_model.data-00000-of-00001 from tensorflow. It does not contain variables and so if you try to do
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph('xxx.model')
you will get an error. Remember, the reason is that, this .model file does not contain any variables nor operation graph of any form. If you also try to do
with tf.Session() as sess:
new_saver = tf.train.Saver()
new_saver.restore(sess, "xxxx.model")
you will similarly get an error. This is because, there are no corresponding variables to load values into. Therefore, if you ever obtain a xxx.model file, you will have to go through the pain of replicating all the variables and operations before trying to run new_saver.restore(sess, "xxxx.model"). If you are able to replicate the architecture, this will run smoothly with no issues, hopefully.
I am sorry this was long, but considering that there is almost no answer on the internet, I had to make a lecture out of it. :)

Tensorflow save/restore batch norm

I trained a model with batch norm in Tensorflow. I would like to save the model and restore it for further using. The batch norm is done by
def batch_norm(input, phase):
return tf.layers.batch_normalization(input, training=phase)
where the phase is True during training and False during testing.
It seems like simply calling
saver = tf.train.Saver()
saver.save(sess, savedir + "ckpt")
would not work well because when I restore the model it first says restored successfully. It also says Attempting to use uninitialized value batch_normalization_585/beta if I just run one node in the graph. Is this related to not saving the model properly or something else that I've missed?
I also had the "Attempting to use uninitialized value batch_normalization_585/beta" error. This comes from the fact that by declaring the saver with the empty brackets like this:
saver = tf.train.Saver()
The saver will save the variables contained in tf.trainable_variables() which do not contain the moving average of the batch normalization. To include this variables into the saved ckpt you need to do:
saver = tf.train.Saver(tf.global_variables())
Which saves ALL the variables, so it is very memory consuming. Or you must identify the variables that have moving avg or variance and save them by declaring them like:
saver = tf.train.Saver(tf.trainable_variables() + list_of_extra_variables)
Not sure if this needs to be explained, but just in case (and for other potential viewers).
Whenever you create an operation in TensorFlow, a new node is added to the graph. No two nodes in a graph can have the same name. You can define the name of any node you create, but if you don't give a name, TensorFlow will pick one for you in a deterministic way (that is, not randomly, but instead always with the same sequence). If you add two numbers, it will probably be Add, but if you do another addition, since no two nodes can have the same name, it may be something like Add_2. Once a node is created in a graph its name cannot be changed. Many functions create several subnodes in turn; for example, tf.layers.batch_normalization creates some internal variables beta and gamma.
Saving and restoring works in the following way:
You create a graph representing the model that you want. This graph contains the variables that will be saved by the saver.
You initialize, train or do whatever you want with that graph, and the variables in the model get assigned some values.
You call save on the saver to, well, save the values of the variables to a file.
Now you recreate the model in a different graph (it can be a different Python session altogether or just another graph coexisting with the first one). The model must be created in exactly the same way the first one was.
You call restore on the saver to retrieve the values of the variables.
In order for this to work, the names of the variables in the first and the second graph must be exactly the same.
In your example, TensorFlow is complaining about the variable batch_normalization_585/beta. It seems that you have called tf.layers.batch_normalization nearly 600 times in the same graph, so you have that many beta variables hanging around. I doubt that you actually need that many, so I guess you are just experimenting with the API and ended up with that many copies.
Here's a draft of something that should work:
import tensorflow as tf
def make_model():
input = tf.placeholder(...)
phase = tf.placeholder(...)
input_norm = tf.layers.batch_normalization(input, training=phase))
# Do some operations with input_norm
output = ...
saver = tf.train.Saver()
return input, output, phase, saver
# We work with one graph first
g1 = tf.Graph()
with g1.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
# Do your training or whatever...
saver.save(sess, savedir + "ckpt")
# We work with a second different graph now
g2 = tf.Graph()
with g2.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
saver.restore(sess, savedir + "ckpt")
# Continue using your model...
Again, the typical case is not to have two graphs side by side, but rather have one graph and then recreate it in another Python session later, but in the end both things are the same. The important part is that the model is created in the same way (and therefore with the same node names) in both cases.

How to restore my loss from a saved meta graph?

I have built a simple tensorflow model that is working fine.
While training I save the meta_graph and also some parameters at different steps.
After that (in a new script) I want to restore the saved meta_graph and restore variables and operations.
Everything works fine, but only the
with tf.name_scope('MSE'):
error = tf.losses.mean_squared_error(Y, yhat, scope="error")
is not going to be restored. With the following line
mse_error = graph.get_tensor_by_name("MSE/error:0")
"The name 'MSE/error:0' refers to a Tensor which does not exist. The
operation, 'MSE/error', does not exist in the graph."
there appears this error message.
As I do exactly the same procedure for other variables and ops that are restored without any error, I don't know how to deal with that. Only difference is that there is only a scope attribute and not a name attribute in the tf.losses.mean_squared_error function.
So how do I restore the loss operation with the scope?
Here the code how I save and load the model.
Saving:
# define network ...
saver = tf.train.Saver(max_to_keep=10)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(NUM_EPOCHS):
# do training ..., save model all 1000 optimization steps
if (i + 1) % 1000 == 0:
saver.save(sess, "L:/model/mlp_model", global_step=(i+1))
Restore:
# start a session
sess=tf.Session()
# load meta graph
saver = tf.train.import_meta_graph('L:\\model\\mlp_model-1000.meta')
# restore weights
saver.restore(sess, tf.train.latest_checkpoint('L:\\model\\'))
# access network nodes
graph = tf.get_default_graph()
X = graph.get_tensor_by_name("Input/X:0")
Y = graph.get_tensor_by_name("Input/Y:0")
# restore output-generating operation used for prediction
yhat_op = graph.get_tensor_by_name("OutputLayer/yhat:0")
mse_error = graph.get_tensor_by_name("MSE/error:0") # this one doesn't work
To get your training step back, the documentation suggests you add it to a collection before saving it as a way to be able to point at it to after restoring your graph.
Saving:
saver = tf.train.Saver(max_to_keep=10)
# put op in collection
tf.add_to_collection('train_op', train_op)
...
Restore:
saver = tf.train.import_meta_graph('L:\\model\\mlp_model-1000.meta')
saver.restore(sess, tf.train.latest_checkpoint('L:\\model\\'))
# recover op through collection
train_op = tf.get_collection('train_op')[0]
Why did your attempt at recovering the tensor by name fail?
You can indeed get the tensor by its name -- the catch is that you need the correct name. And notice that your error argument to tf.losses.mean_squared_error is a scope name, not the name of the returned operation. This can be confusing, as other operations, such as tf.nn.l2_loss, accept a name argument.
In the end, the name of your error operation is MSE/error/value:0, which you can use to get it by name.
That is, until it breaks again in the future when you update tensorflow. tf.losses.mean_squared_error does not give you any guarantee on the name of its output, so it very well may change for some reason.
I think this is what motivates the use of collections: the lack of guarantee on the names of the operators you don't control yourself.
Alternatively, if for some reason you really want to use names, you could rename your operator like this:
with tf.name_scope('MSE'):
error = tf.losses.mean_squared_error(Y, yhat, scope='error')
# let me stick my own name on it
error = tf.identity(error, 'my_error')
Then you can rely on graph.get_tensor_by_name('MSE/my_error:0') safely.
tf.losses.mean_squared_error is an operation not a Tensor, you should load it with
get_operation_by_name:
mse_error = graph.get_operation_by_name("MSE/error")
that should work, note that there is no need for ":0"

How to load several identical models from save files into one session in Tensorflow

In my main code I create a model based on a config file like this
with tf.variable_scope('MODEL') as topscope:
model = create_model(config_file)#returns input node, output node, and some other placeholders
Name of this scope is the same across all saves.
Then I define an optimizer and a cost function, etc.(they are outside of this scope)
Then I create a saver and save it:
saver = tf.train.Saver(max_to_keep=10)
saver.save(sess, 'unique_name', global_step=t)
Now I've created and saved 10 different models, and I want to load them all at once like this maybe:
models = []
for config, save_path in zip(configs, save_paths):
models.append(load_model(config, save_path))
and be able to run them and compare their results, mix them, average etc. I don't need optimizer slot variables for these loaded models. I need only those variables that are inside 'MODEL' scope.
Do I need to create multiple sessions?
How can I do it? I don't know where to start. I can create a model from my config file, then load this same model using this same config file and a save like this:
saver.restore(sess, save_path)
But how do I load more than one?
Edit: I didn't know the word. I want to make an ensemble of networks.
Question that asks it and is still not answered: How to create ensemble in tensorflow?
EDIT 2: Okay, so here's my workaround for now:
Here's my main code, it creates a model, trains it and saves it:
import tensorflow as tf
from util import *
OLD_SCOPE_NAME = 'scope1'
sess = tf.Session()
with tf.variable_scope(OLD_SCOPE_NAME) as topscope:
model = create_model(tf, 6.0, 7.0)
sc_vars = get_all_variables_from_top_scope(tf, topscope)
print([v.name for v in sc_vars])
sess.run(tf.initialize_all_variables())
print(sess.run(model))
saver = tf.train.Saver()
saver.save(sess, OLD_SCOPE_NAME)
Then I run this code creating the same model, loading its checkpoint save and renaming variables:
#RENAMING PART, different file
#create the same model as above here
import tensorflow as tf
from util import *
OLD_SCOPE_NAME = 'scope1'
NEW_SCOPE_NAME = 'scope2'
sess = tf.Session()
with tf.variable_scope(OLD_SCOPE_NAME) as topscope:
model = create_model(tf, 6.0, 7.0)
sc_vars = get_all_variables_from_top_scope(tf, topscope)
print([v.name for v in sc_vars])
saver = tf.train.Saver()
saver.restore(sess, OLD_SCOPE_NAME)
print(sess.run(model))
#assuming that we change top scope, not something in the middle, functionality can be added without much trouble I think
#not sure why I need to remove ':0' part, but it seems to work okay
print([NEW_SCOPE_NAME + v.name[len(OLD_SCOPE_NAME):v.name.rfind(':')] for v in sc_vars])
new_saver = tf.train.Saver(var_list={NEW_SCOPE_NAME + v.name[len(OLD_SCOPE_NAME):v.name.rfind(':')]:v for v in sc_vars})
new_saver.save(sess, NEW_SCOPE_NAME)
Then to load this model into a file containing additional variables and with a new name:
import tensorflow as tf
from util import *
NEW_SCOPE_NAME = 'scope2'
sess = tf.Session()
with tf.variable_scope(NEW_SCOPE_NAME) as topscope:
model = create_model(tf, 5.0, 4.0)
sc_vars = get_all_variables_from_top_scope(tf, topscope)
q = tf.Variable(tf.constant(0.0, shape=[1]), name='q')
print([v.name for v in sc_vars])
saver = tf.train.Saver(var_list=sc_vars)
saver.restore(sess, NEW_SCOPE_NAME)
print(sess.run(model))
util.py:
def get_all_variables_from_top_scope(tf, scope):
#scope is a top scope here, otherwise change startswith part
return [v for v in tf.all_variables() if v.name.startswith(scope.name)]
def create_model(tf, param1, param2):
w = tf.get_variable('W', shape=[1], initializer=tf.constant_initializer(param1))
b = tf.get_variable('b', shape=[1], initializer=tf.constant_initializer(param2))
y = tf.mul(w, b, name='mul_op')#no need to save this
return y
At the conceptual level:
there are two separate things: the graph, and the session
the graph is created first. It defines your model. There's no reason why you cant store multiple models in one graph. Thats fine. It also defines the Variables, but it doesnt actually contain their state
a session is created after the graph
it is created from a graph
you can create as many session as you like from a graph
it holds the state of the different Variables in the graph, ie the weights in your various models
So:
when you load just the model definition, all you need is: one or more graphs. one graph is sufficient
when you load the actual weights for the model, the learned weights/parameters, you need to have created a session for this, from the graph. A single session is sufficient
Note that variables all have names, and they need to be unique. You can give them unique names, in the graph, by using variable scopes, like:
with tf.variable_scope("some_scope_name"):
# created model nodes here...
This will groups your nodes together nicely in the Tensorboard graph.
Ok, rereading your question a bit. It looks like you want to save/load single models at a time.
Saving/loading the parameters/weights of a model happens from the session, which is what contains the weights/parameters of each Variable defined in the graph.
You can refer to these variables by name, eg via the scope you created above, and just save a subset of these variables, into different files, etc.
By the way, its also possible to use session.run(...) to get the values o the weights/parameters, as numpy tensors, which you can then pickle, or whatever, if you choose.

tensorflow restore only variables

I do some training in Tensorflow and save the whole session using a saver:
# ... define model
# add a saver
saver = tf.train.Saver()
# ... run a session
# ....
# save the model
save_path = saver.save(sess,fileSaver)
It works fine, and I can successfully restore the whole session by using the exact same model and calling:
saver.restore(sess, importSaverPath)
Now I want to modify only the optimizer while keeping the rest of the model constant (the computation graph stays the same apart from the optimizer):
# optimizer used before
# optimizer = tf.train.AdamOptimizer
# (learning_rate = learningRate).minimize(costPrediction)
# the new optimizer I want to use
optimizer = tf.train.RMSPropOptimizer
(learning_rate = learningRate, decay = 0.9, momentum = 0.1,
epsilon = 1e-5).minimize(costPrediction)
I also want to continue the training from the last graph state I saved (i.e., I want to restore the state of my variables and continue with another training algorithm). Of course I cannot use:
saver.restore
any longer, because the graph has changed.
So my question is: is there a way to restore only variables using the saver.restore command (or even, maybe for later use, only a subset of variables), when the whole session has been saved? I looked for such feature in the API documentation and online, but could not find any example / detailed enough explanations that could help me get it to work.
It is possible to restore a subset of variables by passing the list of variables as the var_list argument to the Saver constructor. However, when you change the optimizer, additional variables may have been created (momentum accumulators, for instance) and variable associated with the previous optimizer, if any, would have been removed from the model. So simply using the old Saver object to restore will not work, especially if you had constructed it with the default constructor, which uses tf.all_variables as the argument to var_list parameter. You have to construct the Saver object on the subset of variables that you created in your model and then restore would work. Note that, this would leave the new variables created by the new optimizer uninitialized, so you have to explicitly initialize them.
I see the same problem. Inspired by keveman' s answer. My solution is:
Define your new graph, (here only new optimizer related variables are different from the old graph).
Get all variables using tf.global_variables(). This return a var list I called g_vars.
Get all optimizer related variables using tf.contrib.framework.get_variables_by_suffix('some variable filter'). The filter may be RMSProp or RMSPRrop_*. This function returns a var list I called exclude_vars.
Get the variables in g_vars but not in exclude_vars. Simply use
vars = [item for item in g_vars if item not in exclude_vars]
these vars are common vars in both new and old graph, which you can restore from old model now.
you could recover the original Saver from a MetaGraph protobuf first and then use that saver to restore all old variables safely. For a concrete example, you can take a look at the eval.py script: TensorFlow: How do I release a model without source code?