To run a programm with TensorFlow, we must declare a session.
So what is the difference between sess = Session() and sess = Session(Graph()) ?
What is this Graph() ?
When designing a Model in Tensorflow, there are basically 2 steps
building the computational graph, the nodes and operations and how
they are connected to each other
evaluating / running this graph on
some data
A Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated. For example:
# Launch the graph in a session.
sess = tf.Session()
# Evaluate the tensor `c`.
print(sess.run(c))
When you create a Session you're placing a graph into a specified device and If no graph is specified, the Session constructor tries to build a graph using the default one .
sess = tf.Session()
Else during initializing tf.Session(), you can pass in a graph like tf.Session(graph=my_graph)
with tf.Session(graph=my_graph) as sess:
https://www.tensorflow.org/api_docs/python/tf/Session
https://www.tensorflow.org/api_docs/python/tf/Graph
https://github.com/Kulbear/tensorflow-for-deep-learning-research/issues/1
Designing a model in TensorFlow assumes these two parts:
Building graph(s), representing the data flow of the computations.
Running a session(s), executing the operations in the graph.
In general case, there can be multiple graphs and multiple sessions.
But there is always one default graph and one default session.
In that context sess = Session() would assume the default session:
If no graph argument is specified when constructing the session, the default graph will be launched in the session.
sess = Session(Graph()) would assume you are using more than one graph.
If you are using more than one graph (created with tf.Graph() in the same process, you will have to use different sessions for each graph, but each graph can be used in multiple sessions. In this case, it is often clearer to pass the graph to be launched explicitly to the session constructor.
Related
I am trying to improve the performance of our api which uses a model written in tensorflow. I have identified the line which is taking up to ten seconds to execute when multiple processes are executed. It is:
saver = tf.train.import_meta_graph(meta_path)
I am trying to cache this operation somehow and implemented the following code:
graph = None
In actual class:
global graph
if graph == None:
print('init graph...')
graph = tf.train.import_meta_graph(model_path)
saver = copy.deepcopy(graph)
This almost appears to work, but on the third request I get an error:
RuntimeError: The Session graph is empty. Add operations to the graph before calling run().
I am using tensorflow 1.5 and tornado 4.4.3. Updating these components is not possible.
I'm using deeplab V3 structure for an image task, but I make a slight change that add a channel at input. So that the first CNN layer becomes [7,7,4,64] instead of [7,7,3,64].
I plan to do transfer learning, so I hope to recover all parameters except for the fourth channel of this first CNN layer, but these four channels are mastered by one tf.Variable so that I don't know how to recover them by tf.train.Saver. (tf.train.Saver can control which tf.Variable should be recovered but not some values of any tf.Variable I think)
Any idea?
Some codes related are shown below:
Load function
def load(saver, sess, ckpt_path):
saver.restore(sess, ckpt_path)
Part of main function
# All variables need to be restored
restore = [v for v in tf.global_variables()]
# Set up tf session and initialize variables
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config = config)
init = tf.global_variables_initializer()
sess.run(init)
# Load Variables
loader = tf.train.Saver(var_list = restore)
load(loader, sess, args.restore_from)
In main function, we can see that recovered variables are controlled by 'restore'. In this case, the first entry of 'restore' is:
<tf.Variable shape=(7,7,4,64) dtype=float32_ref>
But what I only hope to recover is the first three channels from another model, which is with size (7,7,3,64). And initialize the last channel with a zero initializer.
Any function can help with this?
A possible quick hack could be, instead of creating a variable with the new shape and trying to convert parts of it over, just creating a variable with the part that's missing (so shape=[7,7,1,64]) and concatenating it with your variable and using that as the convolution kernel.
For transfer learning to work properly, this should be zero-inited instead of random variables (which should be fine because the other values break the symmetry), or initialized with values that are very small compared to the pretrained ones (assuming the new channel has the same range of values), otherwise the later layers won't see the distributions they expect.
I've trained a model on some data (just a simple classification task). After, I wish to use this same model to run some predictions via a separate function make_prediction().
So currently my main file is simply something like :
agent.train(data)
agent.make_predictions(new_data)
and tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
I don't initialize the variables in my second function so that session is different to the previous but it is surprising to me that I can't simply reopen a previous session. Do I need to checkpoint the model after training and then reload it each time?
Thanks a lot
I trained a model with batch norm in Tensorflow. I would like to save the model and restore it for further using. The batch norm is done by
def batch_norm(input, phase):
return tf.layers.batch_normalization(input, training=phase)
where the phase is True during training and False during testing.
It seems like simply calling
saver = tf.train.Saver()
saver.save(sess, savedir + "ckpt")
would not work well because when I restore the model it first says restored successfully. It also says Attempting to use uninitialized value batch_normalization_585/beta if I just run one node in the graph. Is this related to not saving the model properly or something else that I've missed?
I also had the "Attempting to use uninitialized value batch_normalization_585/beta" error. This comes from the fact that by declaring the saver with the empty brackets like this:
saver = tf.train.Saver()
The saver will save the variables contained in tf.trainable_variables() which do not contain the moving average of the batch normalization. To include this variables into the saved ckpt you need to do:
saver = tf.train.Saver(tf.global_variables())
Which saves ALL the variables, so it is very memory consuming. Or you must identify the variables that have moving avg or variance and save them by declaring them like:
saver = tf.train.Saver(tf.trainable_variables() + list_of_extra_variables)
Not sure if this needs to be explained, but just in case (and for other potential viewers).
Whenever you create an operation in TensorFlow, a new node is added to the graph. No two nodes in a graph can have the same name. You can define the name of any node you create, but if you don't give a name, TensorFlow will pick one for you in a deterministic way (that is, not randomly, but instead always with the same sequence). If you add two numbers, it will probably be Add, but if you do another addition, since no two nodes can have the same name, it may be something like Add_2. Once a node is created in a graph its name cannot be changed. Many functions create several subnodes in turn; for example, tf.layers.batch_normalization creates some internal variables beta and gamma.
Saving and restoring works in the following way:
You create a graph representing the model that you want. This graph contains the variables that will be saved by the saver.
You initialize, train or do whatever you want with that graph, and the variables in the model get assigned some values.
You call save on the saver to, well, save the values of the variables to a file.
Now you recreate the model in a different graph (it can be a different Python session altogether or just another graph coexisting with the first one). The model must be created in exactly the same way the first one was.
You call restore on the saver to retrieve the values of the variables.
In order for this to work, the names of the variables in the first and the second graph must be exactly the same.
In your example, TensorFlow is complaining about the variable batch_normalization_585/beta. It seems that you have called tf.layers.batch_normalization nearly 600 times in the same graph, so you have that many beta variables hanging around. I doubt that you actually need that many, so I guess you are just experimenting with the API and ended up with that many copies.
Here's a draft of something that should work:
import tensorflow as tf
def make_model():
input = tf.placeholder(...)
phase = tf.placeholder(...)
input_norm = tf.layers.batch_normalization(input, training=phase))
# Do some operations with input_norm
output = ...
saver = tf.train.Saver()
return input, output, phase, saver
# We work with one graph first
g1 = tf.Graph()
with g1.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
# Do your training or whatever...
saver.save(sess, savedir + "ckpt")
# We work with a second different graph now
g2 = tf.Graph()
with g2.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
saver.restore(sess, savedir + "ckpt")
# Continue using your model...
Again, the typical case is not to have two graphs side by side, but rather have one graph and then recreate it in another Python session later, but in the end both things are the same. The important part is that the model is created in the same way (and therefore with the same node names) in both cases.
In my main code I create a model based on a config file like this
with tf.variable_scope('MODEL') as topscope:
model = create_model(config_file)#returns input node, output node, and some other placeholders
Name of this scope is the same across all saves.
Then I define an optimizer and a cost function, etc.(they are outside of this scope)
Then I create a saver and save it:
saver = tf.train.Saver(max_to_keep=10)
saver.save(sess, 'unique_name', global_step=t)
Now I've created and saved 10 different models, and I want to load them all at once like this maybe:
models = []
for config, save_path in zip(configs, save_paths):
models.append(load_model(config, save_path))
and be able to run them and compare their results, mix them, average etc. I don't need optimizer slot variables for these loaded models. I need only those variables that are inside 'MODEL' scope.
Do I need to create multiple sessions?
How can I do it? I don't know where to start. I can create a model from my config file, then load this same model using this same config file and a save like this:
saver.restore(sess, save_path)
But how do I load more than one?
Edit: I didn't know the word. I want to make an ensemble of networks.
Question that asks it and is still not answered: How to create ensemble in tensorflow?
EDIT 2: Okay, so here's my workaround for now:
Here's my main code, it creates a model, trains it and saves it:
import tensorflow as tf
from util import *
OLD_SCOPE_NAME = 'scope1'
sess = tf.Session()
with tf.variable_scope(OLD_SCOPE_NAME) as topscope:
model = create_model(tf, 6.0, 7.0)
sc_vars = get_all_variables_from_top_scope(tf, topscope)
print([v.name for v in sc_vars])
sess.run(tf.initialize_all_variables())
print(sess.run(model))
saver = tf.train.Saver()
saver.save(sess, OLD_SCOPE_NAME)
Then I run this code creating the same model, loading its checkpoint save and renaming variables:
#RENAMING PART, different file
#create the same model as above here
import tensorflow as tf
from util import *
OLD_SCOPE_NAME = 'scope1'
NEW_SCOPE_NAME = 'scope2'
sess = tf.Session()
with tf.variable_scope(OLD_SCOPE_NAME) as topscope:
model = create_model(tf, 6.0, 7.0)
sc_vars = get_all_variables_from_top_scope(tf, topscope)
print([v.name for v in sc_vars])
saver = tf.train.Saver()
saver.restore(sess, OLD_SCOPE_NAME)
print(sess.run(model))
#assuming that we change top scope, not something in the middle, functionality can be added without much trouble I think
#not sure why I need to remove ':0' part, but it seems to work okay
print([NEW_SCOPE_NAME + v.name[len(OLD_SCOPE_NAME):v.name.rfind(':')] for v in sc_vars])
new_saver = tf.train.Saver(var_list={NEW_SCOPE_NAME + v.name[len(OLD_SCOPE_NAME):v.name.rfind(':')]:v for v in sc_vars})
new_saver.save(sess, NEW_SCOPE_NAME)
Then to load this model into a file containing additional variables and with a new name:
import tensorflow as tf
from util import *
NEW_SCOPE_NAME = 'scope2'
sess = tf.Session()
with tf.variable_scope(NEW_SCOPE_NAME) as topscope:
model = create_model(tf, 5.0, 4.0)
sc_vars = get_all_variables_from_top_scope(tf, topscope)
q = tf.Variable(tf.constant(0.0, shape=[1]), name='q')
print([v.name for v in sc_vars])
saver = tf.train.Saver(var_list=sc_vars)
saver.restore(sess, NEW_SCOPE_NAME)
print(sess.run(model))
util.py:
def get_all_variables_from_top_scope(tf, scope):
#scope is a top scope here, otherwise change startswith part
return [v for v in tf.all_variables() if v.name.startswith(scope.name)]
def create_model(tf, param1, param2):
w = tf.get_variable('W', shape=[1], initializer=tf.constant_initializer(param1))
b = tf.get_variable('b', shape=[1], initializer=tf.constant_initializer(param2))
y = tf.mul(w, b, name='mul_op')#no need to save this
return y
At the conceptual level:
there are two separate things: the graph, and the session
the graph is created first. It defines your model. There's no reason why you cant store multiple models in one graph. Thats fine. It also defines the Variables, but it doesnt actually contain their state
a session is created after the graph
it is created from a graph
you can create as many session as you like from a graph
it holds the state of the different Variables in the graph, ie the weights in your various models
So:
when you load just the model definition, all you need is: one or more graphs. one graph is sufficient
when you load the actual weights for the model, the learned weights/parameters, you need to have created a session for this, from the graph. A single session is sufficient
Note that variables all have names, and they need to be unique. You can give them unique names, in the graph, by using variable scopes, like:
with tf.variable_scope("some_scope_name"):
# created model nodes here...
This will groups your nodes together nicely in the Tensorboard graph.
Ok, rereading your question a bit. It looks like you want to save/load single models at a time.
Saving/loading the parameters/weights of a model happens from the session, which is what contains the weights/parameters of each Variable defined in the graph.
You can refer to these variables by name, eg via the scope you created above, and just save a subset of these variables, into different files, etc.
By the way, its also possible to use session.run(...) to get the values o the weights/parameters, as numpy tensors, which you can then pickle, or whatever, if you choose.