How to disable e.g. Dropout in tf.keras.Model to generate activation maximation images using transfer learning - tensorflow2.0

I am using transfer learning and keras.applications.InceptionV3. I manage to train the model successfully.
However, when I want to generate "activation maximisation" images (e.g. the input image that maximizes the activation of one of the custom classes, ref eg https://arxiv.org/pdf/1512.02017v3.pdf ) I struggle to use the pre-trained model since I do manage to use it in "fit" mode and disable all dropouts etc.
What I do is that I combine the pre-trained model in a tf.keras.Sequential to do gradient descent on the weights of the first layer (the input image).
Despite setting base_model.trainable = False however it seems as if the pre-trained model is put into training mode (although weights are not updated) when using model.fit(data) on the outer sequential model.
Is there any way to force the base_model (a child of a Sequential) to be in "predict" mode when calling fit on the outer?

I just came across the same question. After reading some documentation and having a look on the source code of TensorFlows implementations of tf.keras.layers.Layer, tf.keras.layers.Dense, and tf.keras.layers.BatchNormalization I got the following understanding.
If training = False is passed on calling the layer, it will run in inference mode. This has nothing to do with the attribute trainable, which means something different. It would probably lead to less misunderstanding, if they would have called it training_mode instead.
When doing Transfer Learning or Fine Tuning training = False should be passed on calling the base model itself. As far as I saw until now this will only affect layers like tf.keras.layers.Dropout and tf.keras.layers.BatchNormalization and will have not effect on the other layers.
Running in inference mode via training = False will result in tf.layers.Dropout not to apply the dropout rate at all.
As tf.layers.Dropout has no trainable weights, setting the attribute trainable = False will have no effect at all,

Related

How to freeze batch-norm layers during Transfer-learning

I am following the Transfer learning and fine-tuning guide on the official TensorFlow website. It points out that during fine-tuning, batch normalization layers should be in inference mode:
Important notes about BatchNormalization layer
Many image models contain BatchNormalization layers. That layer is a
special case on every imaginable count. Here are a few things to keep
in mind.
BatchNormalization contains 2 non-trainable weights that get updated during training. These are the variables tracking the mean and variance of the inputs.
When you set bn_layer.trainable = False, the BatchNormalization layer will run in inference mode, and will not update its mean & variance statistics. This is not the case for other layers in general, as weight trainability & inference/training modes are two orthogonal concepts. But the two are tied in the case of the BatchNormalization layer.
When you unfreeze a model that contains BatchNormalization layers in order to do fine-tuning, you should keep the BatchNormalization layers in inference mode by passing training=False when calling the base model. Otherwise the updates applied to the non-trainable weights will suddenly destroy what the model has learned.
You'll see this pattern in action in the end-to-end example at the end
of this guide.
Even tho, some other sources, for example this article (titled Transfer Learning with ResNet), says something completely different:
for layer in resnet_model.layers:
if isinstance(layer, BatchNormalization):
layer.trainable = True
else:
layer.trainable = False
ANYWAY, I know that there is a difference between training and trainable parameters in TensorFlow.
I am loading my model from file, as so:
model = tf.keras.models.load_model(path)
And I am unfreezing (or actually freezing the rest) some of the top layers in this way:
model.trainable = True
for layer in model.layers:
if layer not in model.layers[idx:]:
layer.trainable = False
NOW about batch normalization layers: I can either do:
for layer in model.layers:
if isinstance(layer, keras.layers.BatchNormalization):
layer.trainable = False
or
for layer in model.layers:
if layer.name.startswith('bn'):
layer.call(layer.input, training=False)
Which one should I do? And whether finally it is better to freeze batch norm layer or not?
Not sure about the training vs trainable difference, but personally I've gotten good results settings trainable = False.
Now as to whether to freeze them in the first place: I've had good results with not freezing them. The reasoning is simple, the batch norm layer learns the moving average of the initial training data. This may be cats, dogs, humans, cars e.t.c. But when you're transfer learning, you could be moving to a completely different domain. The moving averages of this new domain of images are far different from the prior dataset.
By unfreezing those layers and freezing the CNN layers, my model saw a 6-7% increase in accuracy (82 -> 89% ish). My dataset was far different from the inital Imagenet dataset that efficientnet was trained on.
P.S. Depending on how you plan on running the mode post training, I would advise you to freeze the batch norm layers once the model is trained. For some reason, if you ran the model online (1 image at a time), the batch norm would get all funky and give irregular results. Freezing them post training fixed the issue for me.
Use the code below to see whether the batch norm layer are being freezed or not. It will not only print the layer names but whether they are trainable or not.
def print_layer_trainable(conv_model):
for layer in conv_model.layers:
print("{0}:\t{1}".format(layer.trainable, layer.name))
In this case i have tested your method but did not freezed my model's batch norm layers.
for layer in model.layers:
if isinstance(layer, keras.layers.BatchNormalization):
layer.trainable = False
The code below worked nice for me. In my case the model is a ResNetV2 and the batch norm layers are named with the suffix "preact_bn". By using the code above for printing layers you can see how the batch norm layers are named and configure as you want.
for layer in new_model.layers[:]:
if ('preact_bn' in layer.name):
trainable = False
else:
trainable = True
layer.trainable = trainable
Just to add to #luciano-dourado answer;
In my case, I started by following the Transfer Learning guide as is, that is, freezing BN layers throughout the entire training (classifier + fine-tuning).
What I saw is that training the classifier worked without problems but as soon as I started fine-tuning, the loss went to NaN after a few batches.
After running the usual checks: input data without NaNs, loss functions yielding correct values, etc. I checked if BN layers were running in inference mode (trainable = False).
But in my case, the dataset was so different to ImageNet that I needed to do the contrary, set all trainable BN attributes to True. I found this empirically just as #zwang commented. Just remember to freeze them after training, before you deploy the model for inference.
By the way, just as an informative note, ResNet50V2, for example, has a total 49 BN layers of which only 16 are pre-activations BNs. This means that the remaining 33 layers were updating their mean and variance values.
Yet another case where one has to run several empirical tests to find out why the "standard" approach does not work in his/her case. I guess this further reinforces the importance of data in Deep Learning :)

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

Does model.compile() initialize all the weights and biases in Keras (tensorflow backend)?

When I start training a model, there is no model saved previously. I can use model.compile() safely. I have now saved the model in a h5 file for further training using checkpoint.
Say, I want to train the model further. I am confused at this point: can I use model.compile() here? And should it be placed before or after the model = load_model() statement? If model.compile() reinitializes all the weights and biases, I should place it before model = load_model() statement.
After discovering some discussions, it seems to me that model.compile() is only needed when I have no model saved previously. Once I have saved the model, there is no need to use model.compile(). Is it true or false? And when I want to predict using the trained model, should I use model.compile() before predicting?
When to use?
If you're using compile, surely it must be after load_model(). After all, you need a model to compile. (PS: load_model automatically compiles the model with the optimizer that was saved along with the model)
What does compile do?
Compile defines the loss function, the optimizer and the metrics. That's all.
It has nothing to do with the weights and you can compile a model as many times as you want without causing any problem to pretrained weights.
You need a compiled model to train (because training uses the loss function and the optimizer). But it's not necessary to compile a model for predicting.
Do you need to use compile more than once?
Only if:
You want to change one of these:
Loss function
Optimizer / Learning rate
Metrics
The trainable property of some layer
You loaded (or created) a model that is not compiled yet. Or your load/save method didn't consider the previous compilation.
Consequences of compiling again:
If you compile a model again, you will lose the optimizer states.
This means that your training will suffer a little at the beginning until it adjusts the learning rate, the momentums, etc. But there is absolutely no damage to the weights (unless, of course, your initial learning rate is so big that the first training step wildly changes the fine tuned weights).
Don't forget that you also need to compile the model after changing the trainable flag of a layer, e.g. when you want to fine-tune a model like this:
load VGG model without top classifier
freeze all the layers (i.e. trainable = False)
add some layers to the top
compile and train the model on some data
un-freeze some of the layers of VGG by setting trainable = True
compile the model again (DON'T FORGET THIS STEP!)
train the model on some data

How to use pre-trained model as non trainable sub network in tensorflow?

I'd like to train a network that contains a sub network that I need to stay fix during the training. The basic idea is to prepend and append some layers the the pre-trained network (inceptionV3)
new_layers -> pre-trained and fixed sub-net (inceptionv3) -> new_layers
and run the training process for the task I have without changing the pre-trained one.
I also need to branch directly on some layer of the pre-trained network. For example, with the inceptionV3 I like to uses it from the conv 299x299 to the last pool layer or from the conv 79x79 to the last pool layer.
Whether or not a "layer" is trained is determined by whether the variables used in that layer get updated with gradients. If you are using the Optimizer interface to optimize your network, then you can simply not pass the variables used in the layers that you want to keep fixed to the minimize function, i.e.,
opt.minimize(loss, <subset of variables you want to train>)
If you are using tf.gradients function directly, then remove the variables that you want to keep fixed from the second argument to tf.gradients.
Now, how you "branch directly" to a layer of a pre-trained network depends on how that network is implemented. I would simply locate the tf.Conv2D call to the 299x299 layer you are talking about, and pass as its input, the output of your new layer, and on the output side, locate the 79x79 layer, use its output as the input to your new layer.