Copy variables from one TensorFlow graph to another - tensorflow

I have two tensorflow graphs. One for training and the other for evaluation. They share a lot of variable names. When I evaluate a model I want to copy all variable values from the train graph to the test graph. Obviously, I can do it via tf.train.Saver, but this solution seems not very appropriate to me, especially the fact that we have to use the disk for this.

When you speak about multiple graphs, I assume you mean something like:
g1 = tf.Graph()
with g1.as_default():
# add your stuff
g2 = tf.Graph()
with g2.as_default():
# add other stuff
If this is correct, then are you sure you really need two graphs? Can't you have one graph consisting of two connected components?
Using multiple graphs is discouraged (p 47) because:
Multiple graphs require multiple sessions, each will try to use all available resources by default
Can't pass data between them without passing them through python/numpy, which doesn't work in distributed
It’s better to have disconnected subgraphs within one graph
This also gives you a solution how to pass variables in a non-distributed setting.

Related

Problem when predicting via multiprocess with Tensorflow

I have 4 (or more) models (same structure but different training data). Now I want to ensemble them to make a prediction. I want to pre-load the models and then predict one input message (one message at a time) in parallel via multiprocess. However, the program always stops at "session.run" step. I could not figure it out why.
I tried passing all arguments to the function in each process, as shown in the code below. I also tried using a Queue object and put all the data (except the model object) in the queue. I also tried to set the number of process to 1. It made no difference.
with Manager() as manager:
first_level_test_features=manager.list()
procs =[]
for id in range(4):
p = Process(target=predict, args=(id, (message, models, configs, vocabs, emoji_dict,first_level_test_features)))
procs.append(p)
p.start()
for p in procs:
p.join()
I did not get any error message since it is just stuck there. I would expect the program can start multiple processes and each process uses the model pass to it to make the prediction.
I am unsure how session sharing along different Processes would work, and this is probably where your issue comes from. Given the way TensorFlow works, I would advise implementing the ensemble call as a graph operation, so that it can be run through a single session.run call, with TF handling the parallelization of computations wherever possible.
In practice, if you have symbolic tensors representing the models' predictions, you could use a TF operation to aggregate them (tf.concat, tf.reduce_mean, tf.add_n... whichever suits your design) and end up with a single symbolic tensor representing the ensemble prediction.
I hope this helps; if not, please provide some more details as to what your setting is, notably which form your models have.

keras with tf backend: how to identify variables (tensors) which are in a graph

I've built (in jupyter notebook with Python 3.6) a long ML proof of concept, which, in essence, has 3 parts: load & prepare data; train network; use network.
I would like to be able to re-run it from "train network" without the "cost" of preparing the data again & again (even loading the prepared data from a save file takes a noticeable amount of time).
When I run all cells from the start of the network training (the first cell of which includes a K.clear_session to wipe out any previous network - needed if the architecture changes) it fails as, part way through, there are still variables stored (with the same names) which are part of the old graph.
I can see two simple solutions (but you may be able to advise a better method to tidy up):
loop through all the defined variables (Tensors) in global() and del any which are Tensors (implicitly all part of the old session and graph),
or (better)
loop through all the tensors defined in the (old) graph del'ing them before del'ing the (old) graph.
I can see K.get_uid but can't see how I can use this info to accomplish what I need.
In the meantime I have to reset and rerun the whole workbook everytime I make adjustments to the network.
Is there a better way?

Usage of tf.GraphKeys

In tensorflow, there's a class GraphKeys. I came across many codes, where it's been used. But it's not explained very well what's the usage of this class both in tensorflow documentation as well as in the codes, where it has been used.
Can someone please explain what's the usage of tf.GraphKey?
Thank you!
As far as I know, tf.GraphKeys is a collection of collections of keys for variables and ops in the graph. The usage (just as common python dictionaries) is to retrieve variables and ops.
Given that said, here are some subsets of tf.GraphKeys I came across:
GLOBAL_VARIABLES and LOCAL_VARIABLES contain all variables of the graph, which need to be initialized before training. tf.global_variables() returns the global variables in a list and can be used with tf.variables_initializer for initialization.
Variables created with option trainable=True will be added to TRAINABLE_VARIABLES and will be fetched and updated by any optimizer under tf.train during training.
SUMMARIES contains keys for all summaries added by tf.summary (scalar, image, histogram, text, etc). tf.summary.merge_all gathers all such keys and returns an op to be run and written to file so that you can visualize them on tensorboard.
Custom functions to update some variables can be added to UPDATE_OPS and separately run at each iteration using sess.run(tf.get_collection(tf.GraphKeys.UPDATE_OPS)). In this case, these variables are set trainable=False to avoid being updated by gradient descent.
You may create your own collections using tf.add_to_collection(some_name, var_or_op) and retrieve the variable or op later. You may retrieve specific variables or ops using tf.get_collection() and tweak the scope.

How to assign value to one graph with the other graph who has the same structure in tensorflow?

I'm trying to implement DQN in tensorflow. Here I have one target network and one training network who have the same structure with each other. In the beginning of every 10000 training steps, I want to load the value from checkpoint to target network and training network, then stop_gradient target network. However, I tried those ways, and none of them worked:
1, Put the two networks in one graph. However, every time I load them, I don't know how to assign the value of training network part to target network part.(They are saved in different values, since one is stop gradient.)
2, Define two graphs using tf.graph() and run two session respectively. However, I can't load the checkpoint of one graph to another, even they have the same structure. After all, they are two different graphs.
So, any one who can give me some advice? Very appreciated!
The typical approach would be to put everything in one graph, put your two networks in two name scopes, and then create tf.assign ops for each variable in one scope to the another and use the tf.group to construct a final "copying" operation. Lets assume that function create_net() builds a single network
with tf.name_scope('main_network'):
main_net = create_net()
with tf.name_scope('target_network):
target_network = create_net()
main_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope='main_network')
target_variables = tf.get_collection(tf.GraphKeys.VARIABLES, scope='target_network')
# I am assuming get_collection returns variables in the same order, please double
# check this is actually happening
assign_ops = []
for main_var, target_var in zip(main_variables, target_variables):
assign_ops.append(tf.assign(target_var, tf.identity(main_var)))
copy_operation = tf.group(*assign_ops)
Now executing copy_operation in session.run should copy your main network parameters to the target network. The above code should be considered a pseudo code, rather than something you can copy&paste.

Tensorboard scalars and graphs duplicated

I'm using TensorBoard to visualize network metrics and graph.
I create a session sess = tf.InteractiveSession() and build the graph in Jupyter notebook.
In the graph, I include two summary scalars:
with tf.variable_scope('summary') as scope:
loss_summary = tf.summary.scalar('Loss', cross_entropy)
train_accuracy_summary = tf.summary.scalar('Train_accuracy', accuracy)
I then create a summary_writer = tf.summary.FileWriter(logdir, sess.graph) and run:
_,loss_sum,train_accuracy_sum=sess.run([...],feed_dict=feed_dict)
I write the metrics:
summary_writer.add_summary(loss_sum, i)
summary_writer.add_summary(train_accuracy_sum, i)
I run the code three times.
Each time I run, I re-import TF and create a new interactive session.
But, in Tensorboard, a separate scalar window is created for each run:
Also, the graph appears to be duplicated if I check data for the last run:
How do I prevent duplication of the graph and scalar window each time I run?
I want all data to appear in the same scalar plots (with multiple series / plot).
I want each run to reference a single graph visualization.
I suspect the problem arises because you are running the code three times in the process (same script, Jupyter notebook, or whatever), and those invocations share the same "default graph" in TensorFlow. TensorFlow needs to give each node in the graph a unique name, so it appends "_1" and "_2" to the names of the summary nodes in the second and third invocations.
How do you avoid this? The easiest way is to create a new graph each time you run the code. There are (at least) three ways to do this:
Wrap the code in a with tf.Graph().as_default(): block, which constructs a new tf.Graph object and sets it is the default graph for the extent of the with block.
If you construct your session before creating the graph, you can construct your session as sess = tf.InteractiveSession(graph=tf.Graph()). The newly constructed tf.Graph object remains as the default graph until you call sess.close().
Call tf.reset_default_graph() between invocations of the code.
The with-block approach is the "most structured" way to do things, and might be best if you are writing a standalone script. However, since you are using tf.InteractiveSession, I assume you are using an interactive REPL of some kind, and the other two approaches are probably more useful (e.g. for splitting the execution across multiple cells).
This problem occurs to hold multiple graphs its not a problem if you want to solve this use:
tf.reset_default_graph()