What are "model variables" in Tensorflow and slim? - tensorflow

From https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim,
TF-Slim further differentiates variables by defining model variables,
which are variables that represent parameters of a model. Model
variables are trained or fine-tuned during learning and are loaded
from a checkpoint during evaluation or inference. Examples include the
variables created by a slim.fully_connected or slim.conv2d layer.
Non-model variables are all other variables that are used during
learning or evaluation but are not required for actually performing
inference. For example, the global_step is a variable using during
learning and evaluation but it is not actually part of the model.
Similarly, moving average variables might mirror model variables, but
the moving averages are not themselves model variables.
From https://www.tensorflow.org/versions/master/api_docs/python/state_ops/variable_helper_functions,
tf.model_variables()
Returns all variables in the MODEL_VARIABLES collection.
Yet slim creates "moving mean" variables as part of its batch norm layers that are included in the MODEL_VARIABLES collection.
I can see at least possible definitions of "model variable":
Used in inference.
Fine tuned during training (whether by an optimizer or some other means such as moving averaging).
Stored in checkpoints
Is it the case that Tensorflow's "model variables" are defined by condition 2, while slim's "model variables" are defined by condition 1?

Simply put, slim uses contrib layers. contrib layers use layer_variable_getter, which is actually _model_variable_getter, to generate "model_variable" that are variables added into [tf.GraphKeys.GLOBAL_VARIABLES, tf.GraphKeys.MODEL_VARIABLES] collections.
Therefore model variables are common variables plus being added to MODEL_VARIABLES collection. contrib layers' variables are model variables plus being renamed from 'bias' to 'biases' and from 'kernel' to 'weights' if needed.

Related

Replacing TensorFlow Saver with Checkpoint

I've been using TensorFlow's Saver class to save model parameters, but that class is going away in TensorFlow 2, so I need to replace it with Checkpoint. I can't figure out how to do that. All the examples in the documentation for Checkpoint assume you're saving a tf.keras.Model. I'm not using Keras, so that doesn't apply.
Saver just takes a list of variables to save, so that's what I'm starting from. How do I pass that to Checkpoint? It expects you to pass every checkpointable object as a named argument. I was hoping I could just say variables=[var1, var2, ...], but it doesn't accept lists. I could pass every variable as a separate argument, but what do I use as the names? The variable names? That defeats the whole purpose of checkpoint, which is to be more robust by not depending on variable names. What is the intended way of writing checkpoints in code that doesn't use Keras?

What's the differences between tf.GraphKeys.TRAINABLE_VARIABLES and tf.GraphKeys.UPDATE_OPS in tensorflow?

Here is doc of tf.GraphKeys in tensorflow, such as TRAINABLE_VARIABLES: the subset of Variable objects that will be trained by an optimizer.
And i know tf.get_collection(), which can find some tensor that you want.
When use tensorflow.contrib.layers.batch_norm(), the parameter updates_collections default value is GraphKeys.UPDATE_OPS.
How can we understand those collections, and difference in them.
Besides, we can find more in ops.py.
These are two different things.
TRAINABLE_VARIABLES
TRAINABLE_VARIABLES is the collection of variables or training parameters which should be modified when minimizing the loss. For example, these can be the weights determining the function performed by each node in the network.
How do variables get added to this collection? This happens automatically when you define a new variable with tf.get_variable, unless you specify
tf.get_variable(..., trainable=False)
When would you want a variable to be untrainable? This happens from time to time. For example, occasionally you will want to use a two-step approach in which you first train the entire network on a large, generic dataset, then fine-tune the network on a smaller dataset which is specifically related to your problem. In such cases, you might want to fine-tune only part of the network, e.g., the last layer. Specifying some variables as untrainable is one of the ways to do this.
UPDATE_OPS
UPDATE_OPS is a collection of ops (operations performed when the graph runs, like multiplication, ReLU, etc.), not variables. Specifically, this collection maintains a list of ops which need to run before each training step.
How do ops get added to this collection?
By definition, update_ops occur outside the regular flow of training by loss minimization, so generally you will be adding ops to this collection only under special circumstances. For example, when performing batch normalization, you want to recompute the batch mean and variance before each training step, and this is how it's done. The mechanics of batch normalization using tf.contrib.layers.batch_norm are described in more detail in this article.
Disagree with the previous answer.
Actually, everything is an OP in the tensorflow, the variables in the TRAINABLE_VARIABLES collections are also OPs, which is created by the OP tf.get_variable or tf.Variable.
As for the UPDATE_OPS collection, it usually include the moving average and moving variance, crated in the tf.layers.batch_norm function. These ops can also be regarded as variables, as their values are updated at each training step, just like the weights and bias.
The main difference is that the trainable variables participate the process of back propagation, while the variables in the UPDATE_OPS not. They only participate the inference process in the test mode, so so gridients are computed on these variable in the UPDATE_OPS .

Using the EMA'ed weights for evaluation in Tensorflow

In Tensorflow's tutorial it says that there are two ways to use the EMA'ed weights for evaluation
Build a model that uses the shadow variables instead of the
variables. For this, use the average() method which returns the
shadow variable for a given variable.
Build a model normally but load the checkpoint files to evaluate by
using the shadow variable names. For this use the average_name()
method. See the Saver class for more information on restoring saved
variables.
I understand how to use the second method to use the EMA'ed weights for evaluation as an example is given. I was wondering if someone could give me a simple example of how to build a model that uses the shadow variables.

Tensorflow RNN input size

I am trying to use tensorflow to create a recurrent neural network. My code is something like this:
import tensorflow as tf
rnn_cell = tf.nn.rnn_cell.GRUCell(3)
inputs = [tf.constant([[0, 1]], dtype=tf.float32), tf.constant([[2, 3]], dtype=tf.float32)]
outputs, end = tf.nn.rnn(rnn_cell, inputs, dtype=tf.float32)
Now, everything runs just fine. However, I am rather confused by what is actually going on. The output dimensions are always the batch size x the size of the rnn cell's hidden state - how can they be completely independent of the input size?
If my understanding is correct, the inputs are concatenated to the rnn's hidden state at each step, and then multiplied by a weight matrix (among other operations). This means that the dimensions of the weight matrix need to depend on the input size, which is impossible, because the rnn_cell is created before the inputs are even declared!
After seeing the answer to a question about tensorflow's GRU implementation, I've realized what's going on. Counter to my intuition, the GRUCell constructor doesn't create any weight or bias variables at all. Instead, it creates its own variable scope, and then instantiates the variables on demand when actually called. Tensorflow's variable scoping mechanism ensures that the variables are only created once, and shared across subsequent calls to the GRU.
I'm not sure why they decided to go with this rather confusing implementation, which is as far as I can tell is undocumented. To me it seems more appropriate to use python's object-level variable scoping to encapsulate the tensorflow variables within the GRUCell itself, rather than relying on an additional implicit scoping mechanism.

Initializing new variables in tensorflow

I have built and trained a model. On second phase I want to replace the last two layers and retrain them using different data.
I constantly get the errors for not initializing variables even though I did run initialization on the new vars:
var_init_op = tf.initialize_variables(var_list=[fc1_weights, fc1_biases, fc2_weights, fc2_biases])
sess.run(var_init_op)
I understand I have to initialize the new optimizer (ADAMSolever) as well, but
not sure how to do that.
Assuming I want to replace the optimizer (and other variables) in the middle how do I initialize it without trashing already trained variables?
You can get all the trainable variables using tf.trainable_variables(), and exclude the variables which should be restored from the pretrained model. Then you can initialize the other variables.