Keras remove activation function of last layer - tensorflow

I want to use ResNet50 with Imagenet weights.
The last layer of ResNet50 is (from here)
x = layers.Dense(1000, activation='softmax', name='fc1000')(x)
I need to keep the weights of this layer but remove the softmax function.
I want to manually change it so my last layer looks like this
x = layers.Dense(1000, name='fc1000')(x)
but the weights stay the same.
Currently I call my net like this
resnet = Sequential([
Input(shape(224,224,3)),
ResNet50(weights='imagenet', input_shape(224,224,3))
])
I need the Input layer because otherwise the model.compile says that placeholders aren't filled.

Generally there are two ways of achievieng this:
Quick way - supported functions:
To change the final layer's activation function, you can pass an argument classifier_activation.
So in order to get rid of activation all together, your module can be called like:
import tensorflow as tf
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg",
classifier_activation=None
)
])
This however, is not going to work if the you want a different function, that is not supported by Keras classifer_activation parameter (e. g. custom activation function).
To achieve this you can use the workaround solution:
Long way - copy the model's weights
This solution proposes copying the original model's weights onto your custom one. This approach works because apart from the activation function you are not chaning the model's architecture.
You need to:
1. Download original model.
2. Save it's weights.
3. Declare your modified version of the model (in your case, without the activation function).
4. Set the weights of the new model.
Below snippet explains this concept in more detail:
import tensorflow as tf
# 1. Download original resnet
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
)
])
# 2. Hold weights in memory:
imagenet_weights = resnet.get_weights()
# 3. Declare the model, but without softmax
resnet_no_softmax = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
include_top=False,
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
),
tf.keras.layers.Dense(1000, name='fc1000')
])
# 4. Pass the imagenet weights onto the second resnet
resnet_no_softmax.set_weights(imagenet_weights)
Hope this helps!

Related

How to randomly initialize layers in pretrained model?

I am using Xception model with pre initialized weights trained on ImageNet as so:
model = keras.applications.Xception(
weights='imagenet',
input_shape=(150,150,3)
)
Now I Would like to take specific layer (by its name, using model.get_layer(layerName)) and then reinitialize its weights to completely random one.
What is the simplest way to do so, and if it is even possible?
You could use a reinitialize function like this:
def reinitialize_layer(model, initializer, layer_name):
layer = model.get_layer(layer_name)
layer.set_weights([initializer(shape=w.shape) for w in layer.get_weights()])
Instead of layer_name you could also work with the layer index. You could also extend the function such that it takes a list of layer names, if you like to reinitialize more than one layer.
Usage example:
import keras
model = keras.applications.Xception(
weights='imagenet',
input_shape=(299,299,3)
)
# zeros as illustrative example, change to something else
initializer = keras.initializers.Zeros()
# check pretrained weights
print(model.get_layer("predictions").get_weights())
# change "predictions" to whatever layer name you like to use instead
reinitialize_layer(model, initializer, "predictions")
# check weights after reinitialization
print(model.get_layer("predictions").get_weights())
model.compile(...)
model.fit(...)

Extracting activations from a specific layer of neural network

I was working on an image recognition problem. After training the model, I saved the architecture as well as weights. Now I want to use the model for extracting features from other images and perform SVM on that. For this, I want to remove the last two layers of my model and get the values calculated by the CNN and fully connected layers till then. How can I do that in Keras?
# a simple model
model = keras.models.Sequential([
keras.layers.Input((32,32,3)),
keras.layers.Conv2D(16, 3, activation='relu'),
keras.layers.Flatten(),
keras.layers.Dense(10, activation='softmax')
])
# after training
feature_only_model = keras.models.Model(model.inputs, model.layers[-2].output)
feature_only_model take a (32,32,3) for input and the output is the feature vector
If your model is subclassed - just change call() method.
If not:
if your model is complicated - wrap your model by subclassed model and change forward pass in call() method, or
if your model is simple - create model without the last layers, load weights to every layer separately

What is the expected behavior and purpose of model.trainable=False in tensorflow keras

It seems setting model.trainable=False in tensorflow keras does nothing except for to print a wrong model.summary(). Here is the code to reproduce the issue:
import tensorflow as tf
import numpy as np
IMG_SHAPE = (160, 160, 3)
# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
# for layer in base_model.layers:
# layer.trainable=False
bc=[] #before compile
ac=[] #after compile
for layer in base_model.layers:
bc.append(layer.trainable)
print(np.all(bc)) #True
print(base_model.summary()) ##this changes to show no trainable parameters but that is wrong given the output to previous np.all(bc)
base_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
for layer in base_model.layers:
ac.append(layer.trainable)
print(np.all(ac)) #True
print(base_model.summary()) #this changes to show no trainable parameters but that is wrong given the output to previous np.all(ac)
In light of this - What is the expected behavior and purpose of model.trainable=False in tensorflow keras?
https://github.com/tensorflow/tensorflow/issues/29535
I think this issue could help.
If you are looking for a way to not update some weights in your model I would suggest using the parameter var_list in the minimize function from your Optimizer.
For some reason when creating a model from keras Tensorflow switch all tf.Variables to True, and since all are Tensors we are not able to update the value to False.
What I do in my code is create scope names for all pretrained models and loop over it adding all layers that are not from my pretrained model.
trainable_variables = []
variables_collection = tf.get_collection('learnable_variables')
for layer in tf.trainable_variables():
if 'vgg_model' not in layer.name:
trainable_variables.append(layer)
tf.add_to_collection('learnable_variables', layer)
grad = tf.train.GradientDescentOptimizer(lr)
train_step = grad.minimize(tf.reduce_sum([loss]), var_list=trainable_variables)
Watch out for global_initializer as well, since it will overwrite your pretrained Weights as well. You can solve that by using tf.variables_initializer and passing a list of variables you want to add weights.
sess.run(tf.variables_initializer(variables_collection))
Source I used when trying to solve this problem
Is it possible to make a trainable variable not trainable?
TensorFlow: Using tf.global_variables_initializer() after partially loading pre-trained weights

How can I use TensorFlow's sampled softmax loss function in a Keras model?

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:
import keras.backend as K
def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)
However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:
model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.
The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.
First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.
Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.
Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.
Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.
There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.
from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K
class SampledSoftmax(Layer):
def __init__(self, **kwargs):
super(SampledSoftmax, self).__init__(**kwargs)
def call(self, inputs):
"""
The first input should be the model as it were, and the second the
target (i.e., a repeat of the training data) to compute the labels
argument
"""
# the labels input to this function is batch size by 1, where the
# value at position (i, 1) is the index that is true (not zero)
# e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
biases=inputs[0]._keras_history[0].bias,
inputs=inputs[0],
labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
num_sampled=1000,
num_classes=200000)
def custom_loss(y_true, y_pred):
return K.tf.reduce_mean(y_pred)
num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))
dense = Dense(num_classes)
outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])
model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

Keras models in tensorflow

I'm building image processing network in tensorflow and I want to make use of texture loss. Texture loss seems simple to implement if you have pretrained model loaded.
I'm using TF to build the computational graph for my model and I want to incorporate Keras.application.VGG19 model to get output from layer 'block4_conv4'.
The problem is: I have two TF tensors target and result from my main model, how to feed them into keras VGG19 in the same session to compute their diff and use it in main loss for my model?
It seems following code does the trick
with tf.variable_scope("") as scope:
phi_func = VGG19(include_top=False, weights=None, input_shape=(128, 128, 3))
text_1 = phi_func(predicted)
scope.reuse_variables()
text_2 = phi_func(x)
text_loss = tf.reduce_mean((text_1 - text_2)**2)
right after session created I call phi_func.load_weights(path) to initiate weights