Tensorflow delete graph and free up resources - tensorflow

I create a tensorflow graph and define some tensors and run some stuff. When I'm done, I'd like to delete the graph that I made, and free up all of the resources. How can I do that thing?
temporary_graph = tf.Graph()
with temporary_graph.as_default(), tf.Session() as sess:
foo = tf.placeholder(tf.float32, (2,2))
bar = foo#foo
res = sess.run(bar, feed_dict={foo: np.ones((2,2))})
print(res)
delete_graph_and_free_up_resources(temporary_graph)
This answer claims that the context manager cleans up the graph, but this isn't the case, and the docs don't claim such a thing:
>>> temporary_graph.get_operations()
[<tf.Operation 'Placeholder' type=Placeholder>, <tf.Operation 'matmul' type=MatMul>]
What is the best way to dispose of a graph?

It is not so simple, in order to free the resources that a graph is using you need to lose every reference to that graph, so Python can request to have it deleted from memory. That means deleting direct references to the graph, but also objects referencing the graph (and transitively). That includes operations, tensors and sessions, among other things. In your example, you would need to do:
del temporary_graph, sess, foo, bar, res
And that should make it possible to have the memory freed (not sure if you might need to call the garbage collector in some cases).
As you may not, you can not do this in a function, as it depends on the live references in your program. However, if you keep all references related to the graph within a function or object you should be able to do it fine.

I'm using tensorflow keras, but my approach is to simply clear the session:
tensorflow.keras.backend.clear_session()

Related

Problem when predicting via multiprocess with Tensorflow

I have 4 (or more) models (same structure but different training data). Now I want to ensemble them to make a prediction. I want to pre-load the models and then predict one input message (one message at a time) in parallel via multiprocess. However, the program always stops at "session.run" step. I could not figure it out why.
I tried passing all arguments to the function in each process, as shown in the code below. I also tried using a Queue object and put all the data (except the model object) in the queue. I also tried to set the number of process to 1. It made no difference.
with Manager() as manager:
first_level_test_features=manager.list()
procs =[]
for id in range(4):
p = Process(target=predict, args=(id, (message, models, configs, vocabs, emoji_dict,first_level_test_features)))
procs.append(p)
p.start()
for p in procs:
p.join()
I did not get any error message since it is just stuck there. I would expect the program can start multiple processes and each process uses the model pass to it to make the prediction.
I am unsure how session sharing along different Processes would work, and this is probably where your issue comes from. Given the way TensorFlow works, I would advise implementing the ensemble call as a graph operation, so that it can be run through a single session.run call, with TF handling the parallelization of computations wherever possible.
In practice, if you have symbolic tensors representing the models' predictions, you could use a TF operation to aggregate them (tf.concat, tf.reduce_mean, tf.add_n... whichever suits your design) and end up with a single symbolic tensor representing the ensemble prediction.
I hope this helps; if not, please provide some more details as to what your setting is, notably which form your models have.

Integrating tfe.EagerVariableStore with tfe.Checkpoint?

tfe.Checkpoint seems to require things to be checkpointed to implement CheckpointableBase which EagerVariableStore doesn't.
What is the right way then to use EagerVariableStore to "eagerify" the functional parts of Tensorflow with ability to checkpoint?
Providing some working code would be appreciated.
For eagerifying functional code, I'd suggest tf.make_template rather than EagerVariableStore directly. When executing eagerly, this will create a variable store automatically (allowing variable reuse with tf.get_variable), and the object tf.make_template returns is checkpointable.
import tensorflow as tf
tf.enable_eager_execution()
def uses_functional_layers(x):
return tf.layers.dense(inputs=x, units=1)
save_template = tf.make_template("save_template", uses_functional_layers)
save_checkpoint = tf.train.Checkpoint(model=save_template)
save_template(tf.ones([1, 1]))
save_template.variables[0].assign([42.])
save_output = save_template(tf.ones([1, 1]))
save_path = save_checkpoint.save('/tmp/tf_template_ckpt')
So we make a function which wraps our functional layers / tf.get_variable usage, then make a template object out of that with tf.make_template, and finally can checkpoint that template object after it has been called once to create its variables.
An advantage of doing it this way is that we get restore-on-create for variables in the template, meaning the template is evaluated with the restored values the first time it is called:
import numpy
# Create a second template to load the checkpoint into
restore_template = tf.make_template("save_template", uses_functional_layers)
tf.train.Checkpoint(model=restore_template).restore(save_path)
numpy.testing.assert_allclose(
save_output,
restore_template(tf.ones([1, 1]))) # Variables are restored on creation
numpy.testing.assert_equal([42.], restore_template.variables[0].numpy())
Nested templates work too. Note that the template object strips its own variable_scope from variables created within it, but otherwise uses the full variable names (which may be more fragile than usual object-based checkpointing):
Looking up variables repeatedly with tf.get_variable (done each time the template is evaluated) is also quite slow, which is one reason TensorFlow is moving toward object-oriented Keras-style layers instead of functional layers.
I have found a "hackish" way, but works!
The main problem is:
tfe.EagerVariableStore doesn't inherit CheckpointableBase, hence it can't be saved with tfe.Checkpoint
The big idea is:
We are going to create a CheckpointableBase object that "points" to every variable stored in the tfe.EagerVariableStore
How to know what are stored in EagerVariableStore?
Reference: https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/variable_scope.py
It says that EagerVariableStore uses _VariableStore to store all the variables, via _store.
Now, the _VariableStore stores the variables in self._vars as a dictionary.
If we have a container = tfe.EagerVariableStore(), we can get all the variables via container._store._vars as a dictionary.
How to create a CheckpointableBase that points to every variable then?
We will use tfe.Checkpointable since it has __setattr__.
checkpointable = tfe.Checkpointable()
for k, v in container._store._vars.items():
setattr(checkpointable, k, v)
How to combine the two?
As we have a tfe.Checkpoint for saving, all we need to do is this:
saver = tfe.Checkpoint(checkpointable=checkpointable)
saver.save(...)
And saver.restore(...) to restore.
Your tfe.EagerVariableStore need not to be changed, the checkpointable after restored via tfe.Checkpoint will "replace" the values in tfe.EagerVariableStore automigically!

Copy variables from one TensorFlow graph to another

I have two tensorflow graphs. One for training and the other for evaluation. They share a lot of variable names. When I evaluate a model I want to copy all variable values from the train graph to the test graph. Obviously, I can do it via tf.train.Saver, but this solution seems not very appropriate to me, especially the fact that we have to use the disk for this.
When you speak about multiple graphs, I assume you mean something like:
g1 = tf.Graph()
with g1.as_default():
# add your stuff
g2 = tf.Graph()
with g2.as_default():
# add other stuff
If this is correct, then are you sure you really need two graphs? Can't you have one graph consisting of two connected components?
Using multiple graphs is discouraged (p 47) because:
Multiple graphs require multiple sessions, each will try to use all available resources by default
Can't pass data between them without passing them through python/numpy, which doesn't work in distributed
It’s better to have disconnected subgraphs within one graph
This also gives you a solution how to pass variables in a non-distributed setting.

Tensorboard scalars and graphs duplicated

I'm using TensorBoard to visualize network metrics and graph.
I create a session sess = tf.InteractiveSession() and build the graph in Jupyter notebook.
In the graph, I include two summary scalars:
with tf.variable_scope('summary') as scope:
loss_summary = tf.summary.scalar('Loss', cross_entropy)
train_accuracy_summary = tf.summary.scalar('Train_accuracy', accuracy)
I then create a summary_writer = tf.summary.FileWriter(logdir, sess.graph) and run:
_,loss_sum,train_accuracy_sum=sess.run([...],feed_dict=feed_dict)
I write the metrics:
summary_writer.add_summary(loss_sum, i)
summary_writer.add_summary(train_accuracy_sum, i)
I run the code three times.
Each time I run, I re-import TF and create a new interactive session.
But, in Tensorboard, a separate scalar window is created for each run:
Also, the graph appears to be duplicated if I check data for the last run:
How do I prevent duplication of the graph and scalar window each time I run?
I want all data to appear in the same scalar plots (with multiple series / plot).
I want each run to reference a single graph visualization.
I suspect the problem arises because you are running the code three times in the process (same script, Jupyter notebook, or whatever), and those invocations share the same "default graph" in TensorFlow. TensorFlow needs to give each node in the graph a unique name, so it appends "_1" and "_2" to the names of the summary nodes in the second and third invocations.
How do you avoid this? The easiest way is to create a new graph each time you run the code. There are (at least) three ways to do this:
Wrap the code in a with tf.Graph().as_default(): block, which constructs a new tf.Graph object and sets it is the default graph for the extent of the with block.
If you construct your session before creating the graph, you can construct your session as sess = tf.InteractiveSession(graph=tf.Graph()). The newly constructed tf.Graph object remains as the default graph until you call sess.close().
Call tf.reset_default_graph() between invocations of the code.
The with-block approach is the "most structured" way to do things, and might be best if you are writing a standalone script. However, since you are using tf.InteractiveSession, I assume you are using an interactive REPL of some kind, and the other two approaches are probably more useful (e.g. for splitting the execution across multiple cells).
This problem occurs to hold multiple graphs its not a problem if you want to solve this use:
tf.reset_default_graph()

Is there an no-op (pass-through) operation in tensorflow?

As per title. I'd like to make use of such operation to rename the nodes and better organize a graph. Or is there other recommended practice for renaming an existing node in the graph? Thanks!
There is tf.no_op which allows you to add an operation which does nothing.
As far as I know, you cannot rename a Tensor once created.
However, you can use additional "no-op" operations (like you said):
for a tf.Tensor: tf.identity(input_tensor, name='your_new_name')
for an operation: tf.group(input_operation, name='your_new_name')
After that, you can call the input_tensor with:
graph = tf.get_default_graph()
graph.get_tensor_by_name('your_new_name:0')
Or the input_operation with:
graph = tf.get_default_graph()
graph.get_operation_by_name('your_new_name')