Tensorflow forward pass with dropout - tensorflow

I am trying to use dropout to get error estimates for a neural network. This involves running several forward passes of my network during not only training but also testing, with dropout activated. Dropout layers seem only activated while training but not testing. Can this be done in Tensorflow by just calling some functions or modifying some parameters?

Yes, the easiest way is to use tf.layers.dropout that has a training argument, which can be tensor that you can define by true or false in a any particular session run:
mode = tf.placeholder(tf.string, name='mode')
training = tf.equal(mode, 'train')
...
layer = tf.layers.dropout(layer, rate=0.5, training=training)
...
with tf.Session() as sess:
sess.run(..., feed_dict={mode: 'train'}) # This turns on the dropout
sess.run(..., feed_dict={mode: 'test'}) # This turns off the dropout

Related

Better understanding of training parameter for Keras-Model call method needed

I'd like to get a better understanding of the parameter training, when calling a Keras model.
In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):
pred = model(x, training=True)
and when you want to do inference, you should set training to false:
pred = model(x, training=False)
What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False (because I plan on freezing it like in this tutorial here):
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
What will in such a case happen, when I later on call new_model(x_new, training=True)? Will the usage of training=False for the base_model be overruled? Or will training now allways be set to True for the base_model, regardless of what I pass to the new_model? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True), that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)?
Thanks in advance!
training is a boolean argument that determines whether this call function runs in training mode or inference mode. For example, the Dropout layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.
y = Dropout(0.5)(x, training=True)
By this, we're setting training=True for the Dropout layer for training time. When we call .fit(), it set sets a flag to True and when we use evaluate or predict, in behind it sets a flag to False. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training argument is set as True or False if we want layers to operate either training mode or inference mode respectively.
# training mode
with tf.GradientTape() as tape:
logits = model(x, training=True) # forward pass
# inference mode
al_logits = model(x, training=False)
Now coming to your question. After defining the model
# Freeze the base_model
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
Now if your run this new model whether .fit() or custom training loop, the base_model will always run in inference mode as it's sets training=False.

What is the expected behavior and purpose of model.trainable=False in tensorflow keras

It seems setting model.trainable=False in tensorflow keras does nothing except for to print a wrong model.summary(). Here is the code to reproduce the issue:
import tensorflow as tf
import numpy as np
IMG_SHAPE = (160, 160, 3)
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
# for layer in base_model.layers:
# layer.trainable=False
bc=[] #before compile
ac=[] #after compile
for layer in base_model.layers:
bc.append(layer.trainable)
print(np.all(bc)) #True
print(base_model.summary()) ##this changes to show no trainable parameters but that is wrong given the output to previous np.all(bc)
base_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
for layer in base_model.layers:
ac.append(layer.trainable)
print(np.all(ac)) #True
print(base_model.summary()) #this changes to show no trainable parameters but that is wrong given the output to previous np.all(ac)
In light of this - What is the expected behavior and purpose of model.trainable=False in tensorflow keras?
https://github.com/tensorflow/tensorflow/issues/29535
I think this issue could help.
If you are looking for a way to not update some weights in your model I would suggest using the parameter var_list in the minimize function from your Optimizer.
For some reason when creating a model from keras Tensorflow switch all tf.Variables to True, and since all are Tensors we are not able to update the value to False.
What I do in my code is create scope names for all pretrained models and loop over it adding all layers that are not from my pretrained model.
trainable_variables = []
variables_collection = tf.get_collection('learnable_variables')
for layer in tf.trainable_variables():
if 'vgg_model' not in layer.name:
trainable_variables.append(layer)
tf.add_to_collection('learnable_variables', layer)
grad = tf.train.GradientDescentOptimizer(lr)
train_step = grad.minimize(tf.reduce_sum([loss]), var_list=trainable_variables)
Watch out for global_initializer as well, since it will overwrite your pretrained Weights as well. You can solve that by using tf.variables_initializer and passing a list of variables you want to add weights.
sess.run(tf.variables_initializer(variables_collection))
Source I used when trying to solve this problem
Is it possible to make a trainable variable not trainable?
TensorFlow: Using tf.global_variables_initializer() after partially loading pre-trained weights

How can I tell Keras the learning phase when I use train_on_batch to train a model?

I have dropout layers in my model so I want keras to figure out the training and test phases to run or ignore the dropout layers, and I found that K.set_learning_phase can do me this favor but how can I add it to training and test processes? My code is like this:
def discriminator(self):
x_A = Input(shape=self.shape)
x_B = Input(shape=self.shape)
x = concatenate([x_A, x_B], axis=-1)
self.model = Sequential()
self.model.add(Dropout(0.5, input_shape=self.shape_double))
self.model.add(LSTM(200, return_sequences=True, kernel_constraint=unit_norm()))
self.model.add(Dropout(0.5))
self.model.add(LSTM(200, return_sequences=True, kernel_constraint=unit_norm()))
self.model.add(Dropout(0.5))
self.model.add(Flatten())
self.model.add(Dense(8, activation="softmax", kernel_constraint=unit_norm())
label=self.model(x)
return Model([x_A,x_B], label)
...
def train(self, epoch, batch_size):
for epoch in range(epochs):
for batch,train_A,train_B,train_label in enumerate(Load_train(batch_size)):
Dloss = self.discriminator.train_on_batch([train_A,train_B],train_label)
...
def test(self,test_A,test_B,test_label):
predicted_label_dist = self.discriminator.predict([test_A,test_B])
...
Any suggestions will be appreciated. Thanks.
Keras does figure out the appropriate learning phase on its own by default when you call fit or predict. Hence, your dropout will only be applied during training but not during testing. However, if you still wish to configure training phase on your own i.e. overwrite the default behaviour you can do it like this (from the keras docs):
keras.backend.set_learning_phase(value)
Where:
value: Learning phase value, either 0 or 1 (integers).
simply add this code in your training and testing function.

significance of "trainable" and "training" flag in tf.layers.batch_normalization

What is the significance of "trainable" and "training" flag in tf.layers.batch_normalization? How are these two different during training and prediction?
The batch norm has two phases:
1. Training:
- Normalize layer activations using `moving_avg`, `moving_var`, `beta` and `gamma`
(`training`* should be `True`.)
- update the `moving_avg` and `moving_var` statistics.
(`trainable` should be `True`)
2. Inference:
- Normalize layer activations using `beta` and `gamma`.
(`training` should be `False`)
Example code to illustrate few cases:
#random image
img = np.random.randint(0,10,(2,2,4)).astype(np.float32)
# batch norm params initialized
beta = np.ones((4)).astype(np.float32)*1 # all ones
gamma = np.ones((4)).astype(np.float32)*2 # all twos
moving_mean = np.zeros((4)).astype(np.float32) # all zeros
moving_var = np.ones((4)).astype(np.float32) # all ones
#Placeholders for input image
_input = tf.placeholder(tf.float32, shape=(1,2,2,4), name='input')
#batch Norm
out = tf.layers.batch_normalization(
_input,
beta_initializer=tf.constant_initializer(beta),
gamma_initializer=tf.constant_initializer(gamma),
moving_mean_initializer=tf.constant_initializer(moving_mean),
moving_variance_initializer=tf.constant_initializer(moving_var),
training=False, trainable=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
init_op = tf.global_variables_initializer()
## 2. Run the graph in a session
with tf.Session() as sess:
# init the variables
sess.run(init_op)
for i in range(2):
ops, o = sess.run([update_ops, out], feed_dict={_input: np.expand_dims(img, 0)})
print('beta', sess.run('batch_normalization/beta:0'))
print('gamma', sess.run('batch_normalization/gamma:0'))
print('moving_avg',sess.run('batch_normalization/moving_mean:0'))
print('moving_variance',sess.run('batch_normalization/moving_variance:0'))
print('out', np.round(o))
print('')
When training=False and trainable=False:
img = [[[4., 5., 9., 0.]...
out = [[ 9. 11. 19. 1.]...
The activation is scaled/shifted using gamma and beta.
When training=True and trainable=False:
out = [[ 2. 2. 3. -1.] ...
The activation is normalized using `moving_avg`, `moving_var`, `gamma` and `beta`.
The averages are not updated.
When traning=True and trainable=True:
The out is same as above, but the `moving_avg` and `moving_var` gets updated to new values.
moving_avg [0.03249997 0.03499997 0.06499994 0.02749997]
moving_variance [1.0791667 1.1266665 1.0999999 1.0925]
This is quite complicated.
And in TF 2.0 the behavior is changed, see this:
https://github.com/tensorflow/tensorflow/blob/095272a4dd259e8acd3bc18e9eb5225e7a4d7476/tensorflow/python/keras/layers/normalization_v2.py#L26
About setting layer.trainable = False on a BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the
layer, i.e. its internal state will not change during training:
its trainable weights will not be updated during fit() or
train_on_batch(), and its state updates will not be run. Usually,
this does not necessarily mean that the layer is run in inference
mode (which is normally controlled by the training argument that can
be passed when calling a layer). "Frozen state" and "inference mode"
are two separate concepts.
However, in the case of the BatchNormalization layer, setting
trainable = False on the layer means that the layer will be
subsequently run in inference mode (meaning that it will use the
moving mean and the moving variance to normalize the current batch,
rather than using the mean and variance of the current batch). This
behavior has been introduced in TensorFlow 2.0, in order to enable
layer.trainable = False to produce the most commonly expected
behavior in the convnet fine-tuning use case. Note that:
This behavior only occurs as of TensorFlow 2.0. In 1.*, setting layer.trainable = False would freeze the layer but would not
switch it to inference mode.
Setting trainable on an model containing other layers will recursively set the trainable value of all inner layers.
If the value of the trainable attribute is changed after calling compile() on a model, the new value doesn't take effect for this
model until compile() is called again.
training controls whether to use the training-mode batchnorm (which uses statistics from this minibatch) or inference-mode batchnorm (which uses averaged statistics across the training data). trainable controls whether the variables created inside the batchnorm process are themselves trainable.

Tensorflow (tf-slim) Model with is_training True and False

I would like to run a given model both on the train set (is_training=True) and on the validation set (is_training=False), specifically with how dropout is applied. Right now the prebuilt models expose a parameter is_training that is passed it the dropout layer when building the network. The issue is that If I call the method twice with different values of is_training, I will get two different networks that do no share weights (I think?). How do I go about getting the two networks to share the same weights such that I can run the network that I have trained on the validation set?
I wrote a solution with your comment to use Overfeat in train and test mode. (I couldn't test it so you can check if it works?)
First some imports and parameters:
import tensorflow as tf
slim = tf.contrib.slim
overfeat = tf.contrib.slim.nets.overfeat
batch_size = 32
inputs = tf.placeholder(tf.float32, [batch_size, 231, 231, 3])
dropout_keep_prob = 0.5
num_classes = 1000
In train mode, we pass a normal scope to the function overfeat:
scope = 'overfeat'
is_training = True
output = overfeat.overfeat(inputs, num_classes, is_training,
dropout_keep_prob, scope=scope)
Then in test mode, we create the same scope but with reuse=True.
scope = tf.VariableScope(reuse=True, name='overfeat')
is_training = False
output = overfeat.overfeat(inputs, num_classes, is_training,
dropout_keep_prob, scope=scope)
you can just use a placeholder for is_training:
isTraining = tf.placeholder(tf.bool)
# create nn
net = ...
net = slim.dropout(net,
keep_prob=0.5,
is_training=isTraining)
net = ...
# training
sess.run([net], feed_dict={isTraining: True})
# testing
sess.run([net], feed_dict={isTraining: False})
It depends on the case, the solutions are different.
My first option would be to use a different process to do the evaluation. You only need to check that there is a new checkpoint and load that weights into the evaluation network (with is_training=False):
checkpoint = tf.train.latest_checkpoint(self.checkpoints_path)
# wait until a new check point is available
while self.lastest_checkpoint == checkpoint:
time.sleep(30) # sleep 30 seconds waiting for a new checkpoint
checkpoint = tf.train.latest_checkpoint(self.checkpoints_path)
logging.info('Restoring model from {}'.format(checkpoint))
self.saver.restore(session, checkpoint)
self.lastest_checkpoint = checkpoint
The second option is after every epoch you unload the graph and create a new evaluation graph. This solution waste a lot of time loading and unloading graphs.
The third option is to share the weights. But feeding these networks with queues or dataset can lead to issues, so you have to be very careful. I only use this for Siamese networks.
with tf.variable_scope('the_scope') as scope:
your_model(is_training=True)
scope.reuse_variables()
your_model(is_training=False)