In Tensorflow's tutorial it says that there are two ways to use the EMA'ed weights for evaluation
Build a model that uses the shadow variables instead of the
variables. For this, use the average() method which returns the
shadow variable for a given variable.
Build a model normally but load the checkpoint files to evaluate by
using the shadow variable names. For this use the average_name()
method. See the Saver class for more information on restoring saved
variables.
I understand how to use the second method to use the EMA'ed weights for evaluation as an example is given. I was wondering if someone could give me a simple example of how to build a model that uses the shadow variables.
Related
I've been using TensorFlow's Saver class to save model parameters, but that class is going away in TensorFlow 2, so I need to replace it with Checkpoint. I can't figure out how to do that. All the examples in the documentation for Checkpoint assume you're saving a tf.keras.Model. I'm not using Keras, so that doesn't apply.
Saver just takes a list of variables to save, so that's what I'm starting from. How do I pass that to Checkpoint? It expects you to pass every checkpointable object as a named argument. I was hoping I could just say variables=[var1, var2, ...], but it doesn't accept lists. I could pass every variable as a separate argument, but what do I use as the names? The variable names? That defeats the whole purpose of checkpoint, which is to be more robust by not depending on variable names. What is the intended way of writing checkpoints in code that doesn't use Keras?
I want to train a model. Every 1000 steps, I want to evaluate it on the test set and write it to the tensorboard log. However, there's a problem. I have a code like this:
image_b_train, label_b_train = tf.train.shuffle_batch(...)
out_train = model.inference(image_b_train)
accuracy_train = tf.reduce_mean(...)
image_b_test, label_b_test = tf.train.shuffle_batch(...)
out_test = model.inference(image_b_test)
accuracy_test = tf.reduce_mean(...)
where model inference declares the variables in the model. However, there's a problem. For the test set I have a separate queue, and I can't swap one queue for another with tensorflow.
Currently I solved the problem by creating 2 graphs, one for training and the other for testing. I copy from one graph to the other with tf.train.Saver. Another solution might be to use tf.get_variable, but this is a global variable, and I don't like it because my code becomes less reusable.
Yes, you need two graphs. These graphs can share variables. This can be done by:
Using Keras layers (from tf.contrib.keras) which let you define the model once and use it to compute two inference graphs
Using slim-style layers (from tf.layers) with tf.get_variable and reuse
Using tf.make_template to make your own model-like object which can be called once to build the training graph and once to build the inference graph
Using tf.estimator.Estimator which lets you define a model function once and runs it automatically for training and evaluation for you
There are other options, but any of these is well-supported and should unblock you.
From https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim,
TF-Slim further differentiates variables by defining model variables,
which are variables that represent parameters of a model. Model
variables are trained or fine-tuned during learning and are loaded
from a checkpoint during evaluation or inference. Examples include the
variables created by a slim.fully_connected or slim.conv2d layer.
Non-model variables are all other variables that are used during
learning or evaluation but are not required for actually performing
inference. For example, the global_step is a variable using during
learning and evaluation but it is not actually part of the model.
Similarly, moving average variables might mirror model variables, but
the moving averages are not themselves model variables.
From https://www.tensorflow.org/versions/master/api_docs/python/state_ops/variable_helper_functions,
tf.model_variables()
Returns all variables in the MODEL_VARIABLES collection.
Yet slim creates "moving mean" variables as part of its batch norm layers that are included in the MODEL_VARIABLES collection.
I can see at least possible definitions of "model variable":
Used in inference.
Fine tuned during training (whether by an optimizer or some other means such as moving averaging).
Stored in checkpoints
Is it the case that Tensorflow's "model variables" are defined by condition 2, while slim's "model variables" are defined by condition 1?
Simply put, slim uses contrib layers. contrib layers use layer_variable_getter, which is actually _model_variable_getter, to generate "model_variable" that are variables added into [tf.GraphKeys.GLOBAL_VARIABLES, tf.GraphKeys.MODEL_VARIABLES] collections.
Therefore model variables are common variables plus being added to MODEL_VARIABLES collection. contrib layers' variables are model variables plus being renamed from 'bias' to 'biases' and from 'kernel' to 'weights' if needed.
I read the question here
TensorFlow - get current value of a Variable
and the answer has left me confused.
On one hand, dga says "And to be very clear: Running the variable will
produce only the current value of the variable; it will not run any
assign operations associated with it. It's cheap."
On the other hand, Salvador Dali says "#dga yes, if the variable depends
on n other variables, they also need to be evaluated."
So, which is it? Does evaluating the variable only return its current
value, or does it recompute its value from scratch from the variables it
depends on?
What happens if I evaluate the same variable twice in a row? Does
Tensorflow have any notion of "stale" variables, i.e. variables that
need to be recomputed because their dependencies actually changed (i.e. like in
build system)?
I ask because I work with multiple nets where the partial output of one
net becomes the partial input of another net. I want to fetch the
gradients computed at the input layer of one net and merge+apply them to
the output layer of another net. I was hoping to do this by manually
retrieving/storing gradients in the variables of a graph, and then
running graph operations to backpropagate the gradients. Thus I need to
understand how it all works under the hood.
What I do is similar to this
How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?, but I can't conclude whether it's possible based on the last answer (experimental support now in?)
Thanks!
#dga is correct. If you pass a tf.Variable object to tf.Session.run() TensorFlow will return the current value of the variable, and it will not perform any computation. It is cheap (the cost of a memory copy, or possibly a network transfer in the case of a distributed TensorFlow setup). TensorFlow does not retain any history* about how the value of a tf.Variable was updated, so it cannot in general recompute its value from scratch.
(* Technically TensorFlow remembers the tf.Tensor that was used to initialize each variable, so it is possible to recompute the inital value of the variable.)
I have built and trained a model. On second phase I want to replace the last two layers and retrain them using different data.
I constantly get the errors for not initializing variables even though I did run initialization on the new vars:
var_init_op = tf.initialize_variables(var_list=[fc1_weights, fc1_biases, fc2_weights, fc2_biases])
sess.run(var_init_op)
I understand I have to initialize the new optimizer (ADAMSolever) as well, but
not sure how to do that.
Assuming I want to replace the optimizer (and other variables) in the middle how do I initialize it without trashing already trained variables?
You can get all the trainable variables using tf.trainable_variables(), and exclude the variables which should be restored from the pretrained model. Then you can initialize the other variables.