Tensorflow assign the moving_variance and moving_average of tf.contrib.layers.batch_norm - tensorflow

I have a CNN model with some batch normalization layers within it. The batchnorm layer is constructed by tf.contrib.layers.batch_norm. That model works well in basic circumstances. But a problem is that I don't know how to assign the moving_variance and moving_mean of it.
In details, as the officail website describe, the batch norm layer have variance mean scale offset four parameters. The last two are tensorflow variables which I can tackle well. For the last two, even I can get them with tf.get_collection(tf.GraphKeys.UPDATE_OPS)), they are two tensors which I don't know how to assign them. In most cases these two parameters is set during the training phase.
I have also tried tf.get_collection(tf.GraphKeys.VARIABLES), I can get two tensorflow variables named tf.Variable 'BatchNorm/moving_mean and tf.Variable BachNorm/moving_Variance, althougn I can change these two variables's value with tf.assign, but the wierd thing is that the output of batchNorm doesn't change accordingly
Thanks for any suggestions!

From Tensorflow official site:
https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm
Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)

Related

the batch normlization layer do not work (tensorflow)

I implement a network using tensorflow, and the loss is not converged. Then, I get some value in the network, and I find that the BN layer do not work. Please look at the following picture:
We can see that s2 is the result of batch normalization of s1, but the value in s2 is still very large. I don't know what's the problem. Why the value in s2 is so large?
I have updated my code to github. Someone who is interested can test it.
As per the official tensorflow documentation here,
when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so
they need to be executed alongside the train_op. Also, be sure to add
any batch_normalization ops before getting the update_ops collection.
Otherwise, update_ops will be empty, and training/inference will not
work properly.
For example:
training = tf.placeholder(tf.bool, name="is_training")
# ...
x_norm = tf.layers.batch_normalization(x, training=training)
# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
train_op = optimizer.minimize(loss)
train_op = tf.group([train_op, update_ops])
# or, you can also do something like this:
# with tf.control_dependencies(update_ops):
# train_op = optimizer.minimize(loss)
So, it is really important to get the update ops as stated in the tensorflow documentation because in training time the moving variance and the moving mean of the layer have to be updated. If you don’t do this, batch normalization will not work and the network will not train as expected. It is also useful to declare a placeholder to tell the network if it is in training time or inference time, as during test(or inference) time, the mean and the variance are fixed. They are estimated using the previously calculated means and variances of each training batch.

BatchNormalization in Keras

How do I update moving mean and moving variance in keras BatchNormalization?
I found this in tensorflow documentation, but I don't know where to put train_op or how to work it with keras models:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
No posts I found say what to do with train_op and whether you can use it in model.compile.
You do not need to manually update the moving mean and variances if you are using the BatchNormalization layer. Keras takes care of updating these parameters during training, and to keep them fixed during testing (by using the model.predict and model.evaluate functions, same as with model.fit_generator and friends).
Keras also keeps track of the learning phase so different codepaths run during training and validation/testing.
If you need just update the weights for existing model with some new values then you can do the following:
w = model.get_layer('batchnorm_layer_name').get_weights()
# Order: [gamma, beta, mean, std]
for j in range(len(w[0])):
gamma = w[0][j]
beta = w[1][j]
run_mean = w[2][j]
run_std = w[3][j]
w[2][j] = new_run_mean_value1
w[3][j] = new_run_std_value2
model.get_layer('batchnorm_layer_name').set_weights(w)
There are two interpretations of the question: the first is assuming that the goal is to use high level training api and this question was answered by Matias Valdenegro.
The second - as discussed in the comments - is whether it is possible to use batch normalization with the standard tensorflow optimizer as discussed here keras a simplified tensorflow interface and the section "Collecting trainable weights and state updates". As mentioned there the update ops are accessible in layer.updates and not in tf.GraphKeys.UPDATE_OPS, in fact if you have a keras model in tensorflow you can optimize with a standard tensorflow optimizer and batch normalization like this
update_ops = model.updates
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
and then use a tensorflow session to fetch the train_op. To distinguish training and evaluation modes of the batch normalization layer you need to feed the
learning phase state of the keras engine (see "Different behaviors during training and testing" on the same tutorial page as given above). This would work for example like this
...
# train
lo, _ = tf_sess.run(fetches=[loss, train_step],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 1})
...
# eval
lo = tf_sess.run(fetches=[loss],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 0})
I tried this in tensorflow 1.12 and it works with models containing batch normalization. Given my existing tensorflow code and in the light of approaching tensorflow version 2.0 I was tempted to use this approach myself, but given that this approach is not being mentioned in the tensorflow documentation I am not sure this will be supported in the long term and I finally have decided to not use it and to invest a little bit more to change the code to use the high level api.

Tensorflow batch normalization: difference momentum and renorm_momentum

I want to replicate a network build with the lasagne-library in tensor flow. I'm having some trouble with the batch normalization.
This is the lasagne documentation about the used batch normalization:
http://lasagne.readthedocs.io/en/latest/modules/layers/normalization.html?highlight=batchNorm
In tensorflow I found two functions to normalize:
https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
The first one is simpler but does not let me choose the alpha parameter from lasagne (Coefficient for the exponential moving average of batch-wise means and standard deviations computed during training). I tried using the second function, which has a lot more options, but there are two things I do not understand about it:
I am not clear about the difference between momentum and renorm_momentum. If I have a alpha of 0.9 in the lasagne network, can I just set both tensorflow momentums to 0.9 and expect the same behaviour?
The tf documentation notes:
when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
I do not really understand what is happening here and where I need to put something similar in my code. Can I just put this somewhere before I run the session? What parts of this code piece should I not copy literally but change depending on my code?
There is a big difference between tf.nn.batch_normalization and tf.layers.batch_normalization. See my answer here. So you have made the right choice by using the layers version. Now, on your questions:
renorm_momentum only has an effect is you use batch renormalization by setting the renorm argument to True. You can ignore this if using default batch normalization.
Short answer: You can literally copy that code snippet. Put it exactly where you would normally call optimizer.minimize.
Long answer on 2.: Batch normalization has two "modes": Training and inference. During training, mean and variance of the current minibatch is used. During inference, this is not desirable (e.g. you might not even use batches as input, so there would be no minibatch statistics). For this reason, moving averages over minibatch means/variances are kept during training. These moving averages are then used for inference.
By default, Tensorflow only executes what it needs to. Those moving averages are not needed for training, so they normally would never be executed/updated. The tf.control_dependencies context manager forces Tensorflow to do the updates every time it computes whatever is in the code block (in this case the cost). Since the cost certainly needs to be computed exactly one per training step, this is a good way of making sure the moving averages are updated.
The code example seems a bit arcane, but in context it would really just be (as an example):
loss = ...
train_step = SomeOptimizer().minimize(loss)
with tf.Session() as sess:
....
becomes
loss = ...
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
train_step = SomeOptimizer().minimize(loss)
with tf.Session() as sess:
....
Finally, keep in mind to use the correct training argument for batch normalization so that either minibatch statistics or moving averages are used as intended.

How to find the names of the trainable layers in a tensorflow model?

I am trying to fine tune the last few layers in the tensorflow/slim resnet-v2-50 model for a dataset that I have.
I am struggling to find the names of the layers that I can train. In a tensorflow model, is there a way to find the names of the layers which are train-able? Is there a way to get these names an ordered way so that I can select a few last layers to train? Is there a way to get this information from tensorboard?
Just type
print(tf.trainable_variables())
This will print all the trainable variables.
When you want to train or optimize only certain layers of a pre-trained network, this is what you need to know.
TensorFlow's minimize method takes an optional argument var_list, a list of variables to be adjusted through back-propagation.
If you don't specify var_list, any TF variable in the graph could be adjusted by the optimizer. When you specify some variables in var_list, TF holds all other variables constant.
Here's an example of a script which jonbruner and his collaborator have used.
tvars = tf.trainable_variables()
g_vars = [var for var in tvars if 'g_' in var.name]
g_trainer = tf.train.AdamOptimizer(0.0001).minimize(g_loss, var_list=g_vars)
This finds all the variables they defined earlier that have "g_" in the variable name, puts them into a list, and runs the ADAM optimizer on them.
You can find the related answers here on Quora

model variables in Tensorflow's batch_norm

The documentation online says moving_average and moving_variance are both model_variables, and tf.model_variables() returns tensors of the type local_variables. Does that mean model_variables are not saved when I save my state?
I'm trying to apply batch normalization to a couple of 3D convolution and fully connected layers. I trained my network with batch_norm and saved a checkpoint file, but when I went to restore my saved state, it said moving_mean could not be located. The exact error was, when TF went to assign the restored value to moving_mean, the shape of the lhs tensor, [], could not be reconciled with the that of the rhs, [20].
The graph restores fine when I don't add batch_norm around my layers.
I'm planning to add a global variable at the end of training that saves my moving_mean and moving_variance values. Is this the way TF intended for me to use batch_norm?
Thanks!
The variables moving_mean and moving_variance were not in my saved stated because I had set updates_collections to its default. Since I never included a control dependency when I ran the layers, those variables never were updated.
The code to include is:
from tensorflow.python import control_flow_ops
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
if update_ops:
updates = tf.tuple(update_ops)
total_loss = control_flow_ops.with_dependencies(updates, total_loss)
Or set
updates_collection=None
to for in-place updates.
See the API description and current github discussion for more information.