I am trying to restore graph from a checkpoint. The checkpoint is created by tf.Supervisor. There are both meta file and checkpoint.
What I try to achive is to load this graph from separate application to run some operation (i.e. resue existing model).
I do this as the following (as explained here: https://www.tensorflow.org/api_docs/python/tf/train/import_meta_graph):
meta = 'path/to/file.meta'
my_graph = tf.Graph()
with my_graph.as_default():
with tf.Session() as sess:
saver = tf.train.import_meta_graph(meta)
saver.restore(sess, tf.train.latest_checkpoint(os.path.dirname(meta)))
op = my_graph.get_operation_by_name("op")
print(sess.run(op))
What I see is None. What I expect to see is 1-D Tensor.
I inspected my_graph object using get_collection and find that there are all my variables required for op to run correctly initialized with values restored from the checkpoint.
How can I figure out why the operation is not evaluated correctly? I am really stuck here.
The following code:
print(sess.run(my_graph.get_operation_by_name("Variable_2")))
print(sess.run(my_graph.get_tensor_by_name("Variable_2:0")))
prints
None
4818800
as if there is no connection between an operation and corresponding variable.
The tf.Graph.get_operation_by_name() method always returns a tf.Operation object. When you pass a tf.Operation object to tf.Session.run(), TensorFlow will execute that operation (and everything on which it depends) and discard its outputs (if any).
If you are interested in the value of a particular output, you have to tell TensorFlow which output (a tf.Tensor) you are interested in. There are two main options:
Get a tf.Operation from the graph and then select one of its outputs:
op = my_graph.get_operation_by_name("op")
output = op.outputs[0]
print(sess.run(output))
Get a tf.Tensor from the graph by calling tf.Graph.get_tensor_by_name(), and appending ":<output index>" to the operation's name:
output = my_graph.get_tensor_by_name("op:0")
print(sess.run(output))
Why does TensorFlow draw this distinction? For one thing, a operation can have multiple outputs, so it is sometimes necessary to be specific about which output you want to fetch. For another, an operation may have a side effect and produce a large output—see tf.assign() for an example—and it is often more efficient to pass the tf.Operation to sess.run() so that the value is not copied back into the Python program.
Related
I'm using deeplab V3 structure for an image task, but I make a slight change that add a channel at input. So that the first CNN layer becomes [7,7,4,64] instead of [7,7,3,64].
I plan to do transfer learning, so I hope to recover all parameters except for the fourth channel of this first CNN layer, but these four channels are mastered by one tf.Variable so that I don't know how to recover them by tf.train.Saver. (tf.train.Saver can control which tf.Variable should be recovered but not some values of any tf.Variable I think)
Any idea?
Some codes related are shown below:
Load function
def load(saver, sess, ckpt_path):
saver.restore(sess, ckpt_path)
Part of main function
# All variables need to be restored
restore = [v for v in tf.global_variables()]
# Set up tf session and initialize variables
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config = config)
init = tf.global_variables_initializer()
sess.run(init)
# Load Variables
loader = tf.train.Saver(var_list = restore)
load(loader, sess, args.restore_from)
In main function, we can see that recovered variables are controlled by 'restore'. In this case, the first entry of 'restore' is:
<tf.Variable shape=(7,7,4,64) dtype=float32_ref>
But what I only hope to recover is the first three channels from another model, which is with size (7,7,3,64). And initialize the last channel with a zero initializer.
Any function can help with this?
A possible quick hack could be, instead of creating a variable with the new shape and trying to convert parts of it over, just creating a variable with the part that's missing (so shape=[7,7,1,64]) and concatenating it with your variable and using that as the convolution kernel.
For transfer learning to work properly, this should be zero-inited instead of random variables (which should be fine because the other values break the symmetry), or initialized with values that are very small compared to the pretrained ones (assuming the new channel has the same range of values), otherwise the later layers won't see the distributions they expect.
I use this code to restore my model, but I don't know how to predict after restoring it, which function can I use? I'm a beginner in tensorflow, I have no idea to which parameters or function will be saved.
In the meta model:
sess = tf.Session()
saver = tf.train.import_meta_graph("/home/MachineLearning/model.ckpt.meta")
saver.restore(sess,tf.train.latest_checkpoint('./'))
print("Model restored with success ")
x_predict,y_predict= load_svmlight_file('/MachineLearning/to_predict.csv')
x_predict = x_valid.toarray()
sess.run([] ,feed_dict ) #i don't know how to use predict function
These are the results:
$python predict.py
Model restored with success
Traceback (most recent call last):
File "predict.py", line 23, in <module>
sess.run([] ,feed_dict )
NameError: name 'feed_dict' is not defined
You're almost there. Tensorflow is simply a math library. Your graph is a collection of math operations with the associated dependencies (e.g. a graph, DAG specifically).
When you loaded the graph and associated variables (weights) you loaded all the definitions. Now you need to ask tensorflow to compute some value in the graph. There are lots of values it could compute, the one you want is often named logits (a typical name for the output layer of a neural network). But note that it could be named anything (especially if this isn't a neural network model), you need to understand the model. You might also want to compute an operation named accuracy which is defined to compute the accuracy of a particular batch of inputs (again depends on your model).
Note that you will need to provide tensorflow with whatever it needs to perform these computations. There is generally a placeholder where you pass in your data (and during training a placeholder for your labels which you don't need for prediction because none of the operations you will ask tensorflow to compute depend on it).
But you will need to get references to these various operations (logits, and accuracy) and placeholders (x is a typical name). Since you loaded your graph from disk you don't have the references (note that an alternative way of loading the model is to re-run the code that builds the model, which gives you easy access to the references you need).
In order to get the right references you can look them up by name. Here's how you would get a list of all the operations:
List of tensor names in graph in Tensorflow
Then to get a specific OP (operation) by name:
How to get a tensorflow op by name?
So you'll have something like this:
logits = tf.get_default_graph().get_operation_by_name("logits:0")
x = tf.get_default_graph().get_operation_by_name("x:0")
accuracy = tf.get_default_graph().get_operation_by_name("accuracy:0")
Note that the :0 is an index added to all names in tensorflow to avoid duplicate names. Now you have all the references you need and you can use sess.run to perform a specific computation, providing the input data, and OPs you'd like to have computed:
sess.run([logits, accuracy], feed_dict={x:your_input_data_in_numpy_format})
The names of these elements will vary in your implementation, I've used the most common names. If they weren't given pretty names it'll be hard to identify them and you'll need to look through the original code that produced the graph. In fact if they weren't named properly looking them up by name is so painful that it's probably better to just re-run the code that produced the original graph rather than import the meta graph. Notice that saver.restore only restores the actual data, import_meta_graph is the optional piece which can be replaced by simply re-building the graph programmatically.
I trained a model with batch norm in Tensorflow. I would like to save the model and restore it for further using. The batch norm is done by
def batch_norm(input, phase):
return tf.layers.batch_normalization(input, training=phase)
where the phase is True during training and False during testing.
It seems like simply calling
saver = tf.train.Saver()
saver.save(sess, savedir + "ckpt")
would not work well because when I restore the model it first says restored successfully. It also says Attempting to use uninitialized value batch_normalization_585/beta if I just run one node in the graph. Is this related to not saving the model properly or something else that I've missed?
I also had the "Attempting to use uninitialized value batch_normalization_585/beta" error. This comes from the fact that by declaring the saver with the empty brackets like this:
saver = tf.train.Saver()
The saver will save the variables contained in tf.trainable_variables() which do not contain the moving average of the batch normalization. To include this variables into the saved ckpt you need to do:
saver = tf.train.Saver(tf.global_variables())
Which saves ALL the variables, so it is very memory consuming. Or you must identify the variables that have moving avg or variance and save them by declaring them like:
saver = tf.train.Saver(tf.trainable_variables() + list_of_extra_variables)
Not sure if this needs to be explained, but just in case (and for other potential viewers).
Whenever you create an operation in TensorFlow, a new node is added to the graph. No two nodes in a graph can have the same name. You can define the name of any node you create, but if you don't give a name, TensorFlow will pick one for you in a deterministic way (that is, not randomly, but instead always with the same sequence). If you add two numbers, it will probably be Add, but if you do another addition, since no two nodes can have the same name, it may be something like Add_2. Once a node is created in a graph its name cannot be changed. Many functions create several subnodes in turn; for example, tf.layers.batch_normalization creates some internal variables beta and gamma.
Saving and restoring works in the following way:
You create a graph representing the model that you want. This graph contains the variables that will be saved by the saver.
You initialize, train or do whatever you want with that graph, and the variables in the model get assigned some values.
You call save on the saver to, well, save the values of the variables to a file.
Now you recreate the model in a different graph (it can be a different Python session altogether or just another graph coexisting with the first one). The model must be created in exactly the same way the first one was.
You call restore on the saver to retrieve the values of the variables.
In order for this to work, the names of the variables in the first and the second graph must be exactly the same.
In your example, TensorFlow is complaining about the variable batch_normalization_585/beta. It seems that you have called tf.layers.batch_normalization nearly 600 times in the same graph, so you have that many beta variables hanging around. I doubt that you actually need that many, so I guess you are just experimenting with the API and ended up with that many copies.
Here's a draft of something that should work:
import tensorflow as tf
def make_model():
input = tf.placeholder(...)
phase = tf.placeholder(...)
input_norm = tf.layers.batch_normalization(input, training=phase))
# Do some operations with input_norm
output = ...
saver = tf.train.Saver()
return input, output, phase, saver
# We work with one graph first
g1 = tf.Graph()
with g1.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
# Do your training or whatever...
saver.save(sess, savedir + "ckpt")
# We work with a second different graph now
g2 = tf.Graph()
with g2.as_default():
input, output, phase, saver = make_model()
with tf.Session() as sess:
saver.restore(sess, savedir + "ckpt")
# Continue using your model...
Again, the typical case is not to have two graphs side by side, but rather have one graph and then recreate it in another Python session later, but in the end both things are the same. The important part is that the model is created in the same way (and therefore with the same node names) in both cases.
I have a model that I trained. I wish to extract from the model the output of an intermediate maxpool layer.
I tried the following
saver = tf.train.import_meta_graph(BASE_DIR + LOG_DIR + '/model.ckpt.meta')
saver.restore(sess,tf.train.latest_checkpoint(BASE_DIR + LOG_DIR))
sess.run("maxpool/maxpool",feed_dict=feed_dict)
here, feed_dict contains the placeholders and their contents for this run in a dictionary.
I keep getting the following error
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_1_1' with dtype float and shape...
what can be the cause of this? I generated all of the placeholders and input them in the feed dictionary.
I ran in to a similar issue and it was frustrating. What got me around it was filling out the name field for every variable and operation that I wanted to call later. You also may need to add your maxpool/maxpool op to a collection with tf.add_to_collection('name_for_maxpool_op', maxpool_op_handle). You can then restore the ops and named tensors with:
# Restore from metagraph.
saver = tf.train.import_meta_graph(...)
sess = tf.Session()
saver = restore(sess, ...)
graph = sess.graph
# Restore your ops and tensors.
maxpool_op = tf.get_collection('name_for_maxpool_op')[0] # returns a list, you want the first element
a_tensor = graph.get_tensor_by_name('tensor_name:0') # need the :0 added to your name
Then you would build your feed_dict using your restored tensors. More information can be found here. Also, as you mentioned in your comment, you need to pass the op itself to sess.run, not it's name:
sess.run(maxpool_op, feed_dict=feed_dict)
You can access your tensors and ops from a restored metagraph even if you did not name them (to avoid retraining the model with new fancy tensor names, for instance), but it can be a bit of a pain. The names given to the tensors automatically are not always the most transparent. You can list the names of all variables in your graph with:
print([v.name for v in tf.all_variables()])
You can hopefully find the name that you are looking for there and then restore that tensor using graph.get_tensor_by_name as described above.
I am trying to print a tensor attns in tensorflow seq2seq code. Seq2Seq.py
I tried:
tf.Print(attns, [attns])
but it prints nothing.
I tried
sess = tf.Session()
sess.run(attns) or attns.eval()
I this case it throws: InvalidArgumentError: You must feed a value for placeholder tensor
I have also tried using sess.run()
sess = tf.get_default_session()
aa = sess.run(attns)
In this case sess object is None.
tf.Print is not a "classic" operational instruction, since those are not executed in symbolic, graph-based code. What is needed instead is a specific node in the computation graph that will then be triggered whenever your computations "passes" that node.
This is exactly what tf.Print does. It creates a "wrapper" node around any other node by creating an identity operation that, when triggered, prints the value of a list of tensors.
The first argument of this print function, input_ (or attns in your case) is the wrapped node, and data (or [attns] in your case) is the list of tensors to be printed.
What you therefore want to do is to add this line:
attns = tf.Print(attns, [attns])
Here, attns is assigned a print wrapper identity operation on attns - so the tensor attns has exactly the same behavior, except that when it is computed, it will also print [attns].