How to compute saliency map using keras backend - tensorflow

I am trying to construct a basic "vanilla gradient" saliency heatmap (gradient-based feature attribution) for MNIST using keras. I know there are libraries such as this one to compute saliency heatmaps, but I would like to construct this from scratch since the vanilla gradient approach seems conceptually straightforward to implement. I have trained the following digit classifier in Keras using functional model definition:
input = layers.Input(shape=(28,28,1), name='input')
conv2d_1 = layers.Conv2D(32, kernel_size=(3, 3), activation='relu')(input)
maxpooling2d_1 = layers.MaxPooling2D(pool_size=(2, 2), name='maxpooling2d_1')(conv2d_1)
conv2d_2 = layers.Conv2D(64, kernel_size=(3, 3), activation='relu')(maxpooling2d_1)
maxpooling2d_2 = layers.MaxPooling2D(pool_size=(2, 2))(conv2d_2)
flatten = layers.Flatten(name='flatten')(maxpooling2d_2)
dropout = layers.Dropout(0.5, name='dropout')(flatten)
dense = layers.Dense(num_classes, activation='softmax', name='dense')(dropout)
model = keras.models.Model(inputs=input, outputs=dense)
Now, I want to compute the saliency map for a single MNIST image. Since the final layer has a softmax activation and the denominator is a normalization term (so that the output nodes add up to 1), I believe that I need to either take the pre-softmax output or change the activation of the trained model linear for computing saliency maps. I will do the latter.
model.layers[-1].activation = tf.keras.activations.linear # swap activation to linear
input = loaded_model.layers[0].input
output = loaded_model.layers[-1].output
input_image = x_test[0] # shape is (28, 28, 1)
pred = np.argmax(loaded_model.predict(np.expand_dims(input_image, axis=0))) # predicted class
However, I am not sure what to do beyond this. I know I can use the following K.gradients(output, input) to compute gradients. That being said, I believe I should compute the gradient of the predicted class with respect to the input image, versus computing the gradient of the entire output. How would I do this? Also, I'm not sure how to evaluate the saliency heatmap for a specific image/prediction. I imagine I will have to use sess = tf.keras.backend.get_session() and sess.run(), but not sure exactly. I would greatly appreciate any help with completing the saliency heatmap code. Thanks!

If you add the activation as a single layer after the last dense layer with:
keras.layers.Activation('softmax')
you can do:
linear_model = keras.Model(input=model, output=model.layers[-2].output)
To then compute the gradients like:
def get_saliency_map(model, image, class_idx):
with tf.GradientTape() as tape:
tape.watch(image)
predictions = model(image)
loss = predictions[:, class_idx]
# Get the gradients of the loss w.r.t to the input image.
gradient = tape.gradient(loss, image)
# take maximum across channels
gradient = tf.reduce_max(gradient, axis=-1)
# convert to numpy
gradient = gradient.numpy()
# normaliz between 0 and 1
min_val, max_val = np.min(gradient), np.max(gradient)
smap = (gradient - min_val) / (max_val - min_val + keras.backend.epsilon())
return smap

Related

Extracting gradient of Keras Embedding layer

I want to extract the gradient of a RNN model starting with an embedding layer using Tensorflow's GradientTape (using tensorflow 1.14 with eager execution). The model is a simple LSTM binary classifier, which is trained with a binary crossentropy loss:
inputs = Input(name='inputs', shape=[150])
layer = Embedding(2000, 50, input_length=150)(inputs)
layer = LSTM(64)(layer)
layer = Dense(256, name='FC1')(layer)
layer = Activation('relu')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(1, name='out_layer')(layer)
layer = Activation('sigmoid')(layer)
model = Model(inputs=inputs, outputs=layer)
GradientTape should return "... a list or nested structure of Tensors (or IndexedSlices, or None, or CompositeTensor), one for each element in sources". What is the correct way to use it to recover (and apply) the gradient?
I tried the following code:
with tf.GradientTape() as tape:
y_ = model(inputs)
loss_value = BinaryCrossEntropy()(y_true=targets, y_pred=y_)
grads = tape.gradient(loss_value, model.trainable_variables)
# some custom processing
optimizer = RMSprop(learning_rate=0.001, name="context")
optimizer.apply_gradients(list(zip(grads, model.trainable_variables)), name="context")
I would expect the returned gradient to be of size (2000,50), i.e., the shape of weights for the embedding layer. Instead, it takes a size that depends on the batch size, and cannot be used (at least with the code above) with apply_gradient. Changing the number of inputs consistently changes the first dimension of the gradient to batch_size * 150, while the shape of the trainable variables stays correct. If using 8 inputs, for example, I get the following result:
input shape: (8, 150), output shape: (8, 1)
model.trainable_variables shapes: (2000, 50),(50, 256),(64, 256),(256,),(64, 256),(256,),(256, 1),(1,)
tape.gradient shapes: (1200, 50),(50, 256),(64, 256),(256,),(64, 256),(256,),(256, 1),(1,)
With a batch size of 32, the first compunent would be (4800, 50), and so on. This doesn't match my understanding of GradientTape.gradient, since the returned gradient doesn't have the same size as the sources parameter. What did I miss?

Deep Learning model (LSTM) predicts same class label

I am trying to solve the Spoken Digit Recognition task using the LSTM model, where the audio files are converted into spectrograms and fed into an LSTM model after doing Global Average Pooling. Here is the architecture of it
tf.keras.backend.clear_session()
#input layer
input_= Input(shape = (64, 35))
lstm = LSTM(100, activation='tanh', return_sequences= True, kernel_regularizer = l2(0.000001),
recurrent_initializer = 'glorot_uniform')(input_)
lstm = GlobalAveragePooling1D(data_format='channels_first')(lstm)
dense = Dense(20, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer='glorot_uniform')(lstm)
drop = Dropout(0.8)(dense)
dense1 = Dense(25, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer= 'he_uniform')(drop)
drop = Dropout(0.95)(dense1)
output = Dense(10,activation = 'softmax', kernel_regularizer = l2(0.000001), kernel_initializer= 'glorot_uniform')(drop)
model_2 = Model(inputs = [input_], outputs = output)
model_2.summary()
Having summary as -
I need to calculate the F1 score to check the performance of the model, I have implemented a custom callback and used TensorFlow addons F1 score too. However, I won't get the correct result, for every epoch I get the constant F1 score value.
On further digging, I found out that my model predicts the same class label, for the entire epoch, whereas it is supposed to predict 10 classes in one epoch. as there are 10 class label values present.
Here is my model.compile and model.predict commands. I have used TensorFlow addon here -
from tensorflow import keras
opt = keras.optimizers.Adam(0.001, clipnorm=0.8)
model_2.compile(loss='categorical_crossentropy', optimizer=opt, metrics = metric)
hist = model_2.fit([X_train_spectrogram],
[y_train_converted],
validation_data= ([X_test_spectrogram], [y_test_converted]),
epochs = 10,
verbose =1,
callbacks=[tensorBoard_callbk2, ClearMemory()],
# steps_per_epoch = 3,
batch_size=32)
Here is what I mean by getting the same prediction, the entire array is filled with the same predicted values.
Why is the model predicting the same class label? or How to rectify it?
I have tried increasing the number of trainable parameters, increasing - decreasing batch size too, but it won't help me. If anyone knows can you please help me out?

How to apply Triplet Loss for a ResNet50 based Siamese Network in Keras or Tf 2

I have a ResNet based siamese network which uses the idea that you try to minimize the l-2 distance between 2 images and then apply a sigmoid so that it gives you {0:'same',1:'different'} output and based on how far the prediction is, you just flow the gradients back to network but there is a problem that updation of gradients is too little as we're changing the distance between {0,1} so I thought of using the same architecture but based on Triplet Loss.
I1 = Input(shape=image_shape)
I2 = Input(shape=image_shape)
res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x1 = res_m_1.output
x2 = res_m_2.output
# x = Flatten()(x) or use this one if not using any pooling layer
distance = Lambda( lambda tensors : K.abs( tensors[0] - tensors[1] )) ([x1,x2] )
final_output = Dense(1,activation='sigmoid')(distance)
siamese_model = Model(inputs=[I1,I2], outputs=final_output)
siamese_model.compile(loss='binary_crossentropy',optimizer=Adam(),metrics['acc'])
siamese_model.fit_generator(train_gen,steps_per_epoch=1000,epochs=10,validation_data=validation_data)
So how can I change it to use the Triplet Loss function? What adjustments should be done here in order to get this done? One change will be that I'll have to calculate
res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x3 = res_m_3.output
One thing found in tf docs is triplet-semi-hard-loss and is given as:
tfa.losses.TripletSemiHardLoss()
As shown in the paper, the best results are from triplets known as "Semi-Hard". These are defined as triplets where the negative is farther from the anchor than the positive, but still produces a positive loss. To efficiently find these triplets we utilize online learning and only train from the Semi-Hard examples in each batch.
Another implementation of Triplet Loss which I found on Kaggle is: Triplet Loss Keras
Which one should I use and most importantly, HOW?
P.S: People also use something like: x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x) after model.output. Why is that? What is this doing?
Following this answer of mine, and with role of TripletSemiHardLoss in mind, we could do following:
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
from tensorflow.keras import models, layers
BATCH_SIZE = 32
LATENT_DEM = 128
def _normalize_img(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
train_dataset, test_dataset = tfds.load(name="mnist", split=['train', 'test'], as_supervised=True)
# Build your input pipelines
train_dataset = train_dataset.shuffle(1024).batch(BATCH_SIZE)
train_dataset = train_dataset.map(_normalize_img)
test_dataset = test_dataset.batch(BATCH_SIZE)
test_dataset = test_dataset.map(_normalize_img)
inputs = layers.Input(shape=(28, 28, 1))
resNet50 = tf.keras.applications.ResNet50(include_top=False, weights=None, input_tensor=inputs, pooling='avg')
outputs = layers.Dense(LATENT_DEM, activation=None)(resNet50.output) # No activation on final dense layer
outputs = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(outputs) # L2 normalize embedding
siamese_model = models.Model(inputs=inputs, outputs=outputs)
# Compile the model
siamese_model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tfa.losses.TripletSemiHardLoss())
# Train the network
history = siamese_model.fit(
train_dataset,
epochs=3)

Neural Network isn't learning anything meaningful using Triplet Loss

I am working on a triplet loss based model for this Kaggle competition.
Short Description- In this competition, we have been challenged to build an algorithm to identify individual whales in images by analyzing a database of containing more than 25,000 images, gathered from research institutions and public contributors.
https://www.kaggle.com/c/humpback-whale-identification?rvi=1
I have decided to use a Siamese network architecture and train it to give me encodings which I can then use to calculate the distance between two pictures of whales. If this distance is below a particular threshold the two pictures belong to the same whale and if this distance is greater then, they aren't the same whale.
This is the Triplet loss function(learnt it from Andrew's deeplearning specialization) I used but i also normalized the encoding's to make the loss function more interpretable(easier to determine margin and split point) across different models(if that makes sense).(First, tried it without the normalization and when it didnt work i tried normalizing.) I also have tried changing alpha(margin) and varied it from 0.2 to 0.6.
from tensorflow.nn import l2_normalize as norm_l2
def triplet_loss(y_true, y_pred, alpha = 0.3):
"""
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
anchor, positive, negative = norm_l2(anchor), norm_l2(positive), norm_l2(negative)
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)), axis = -1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)), axis = -1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
return loss
This is an example of one of the model architectures i tried out. I have tried using pretrained Facenet, ResNet, DenseNet and Xception till now. I have tried Freezing different numbers of layers in each.
R = tf.keras.applications.ResNet50(include_top=False, weights = 'imagenet', input_shape=(224,224,3))
lr = 0.0001
optimizer = Adam(learning_rate=lr)
R.compile(optimizer=optimizer, loss = triplet_loss)
for layer in R.layers[0:30]:
layer.trainable = False
em_Rmodel = Sequential([
R,
GlobalAveragePooling2D(),
#tf.keras.layers.GlobalMaxPooling2D(),
Dense(512, activation='relu'),
bn(),
Dense(256, activation = 'sigmoid'),
Dense(128, activation = 'sigmoid')
])
def make_tripletModel(model):
#I was manually changing the input shape to fit the default shape of pretrained networks
A = Input(shape = (224, 224, 3), name='anchor')
P = Input(shape = (224, 224, 3), name = 'anchorPositive')
N = Input(shape = (224, 224, 3), name = 'anchorNegative')
enc_A = model(A)
enc_P = model(P)
enc_N = model(N)
tripletModel = Model(inputs=[A, P, N], outputs=[enc_A, enc_P, enc_N])
return tripletModel
tripletModel = make_tripletModel(em_Rmodel)
I have been training using semi-hard triplets and have also been augmenting data properly to generate more training images.
This is the batch generator that i used for training. crop_batch is a function that crops images to show only the whale's tail, using which one can identify whales. It uses a DenseNet trained on more than 1000 images with whale tails and the bounding box surrounding it. Does the work sufficiently well.
def batch_generator_RN(batch_size = batch_size, ishape = (256, 256, 3), model_input_shape = (224, 224, 3)):
triplet_generator = get_triplets()
y_val = np.zeros((batch_size, 2, 1))
anchors = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
positives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
negatives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
while True:
for i in range(batch_size):
anchors[i], positives[i], negatives[i] = next(triplet_generator)
anc = crop_batch(anchors, batch_size= batch_size, img_shape=model_input_shape)
pos = crop_batch(positives, batch_size= batch_size, img_shape=model_input_shape)
neg = crop_batch(negatives, batch_size= batch_size, img_shape=model_input_shape)
x_data = {'anchor': anc,
'anchorPositive': pos,
'anchorNegative': neg
}
yield (x_data, [y_val, y_val, y_val])
And finally, this, in general, is how i have been trying to train these models. I have tried reducing and increasing learning rate, batch_size = 16.
lr = 0.0001
optimizer = Adam(learning_rate=lr)
tripletModel.compile(optimizer = optimizer, loss = triplet_loss)
es = EarlyStopping(monitor='loss', patience=20, min_delta=0.05, restore_best_weights=True)
#mc = ModelCheckpoint('Rmodel.h5', monitor='loss', save_best_only=True, save_weights_only=True)
rlr = ReduceLROnPlateau(monitor='loss',min_delta=0.05,factor = 0.1,patience = 5, verbose = 1, min_lr = 0)
gen = batch_generator(batch_size)
tripletModel.fit(gen, steps_per_epoch=64, epochs = 40, callbacks=[es, rlr])
So after training all these models, in some models the triplet loss does go down for a while but then plateaus and basically learns nothing meaningful(which basically means that just by looking at the distance between two embeddings i cant figure out if they are the same whale or not.). In other models, immediately after the first or the second epoch the weights converge and don't change at all and doesn't learning anything.
I have tried a very wide range of learning rates and i am pretty sure that it isnt the problem.
Please tell me if i should add all the code files for you to understand the problem better. The reason i havent done it yet because i havent cleaned it but will gladly do so if required. Thanks.
When you say that it doesn't learn anything, is it that the loss reaches a plateau and thus it stops decreasing or it does decrease significantly but when you predict the embeddings of both same and different whales are are similar in value?
The triples_loss() fn and batch_generator_RN() fn are correct, the problem is not related to the data generation.
However, I suspect that your learning rate is too high while you freeze a lot of layers, i.e. numerous trainable parameters are frozen, which may lead to your network being unable to converge.
My suggestion is to unfreeze all the layers and decrease the learning rate to 0.00001 and start training again, regardless of the architecture that you use (Xception/ResNet etc.)

'Channels first' training accuracy very low compared to 'channels last'

My issue:
I am trying to train a semantic segmentation model in tf.keras, in fact it works very well when I am using channels_last (WHC) mode (it reaches 96%+ val acc). I wanted to train it in channels_first (CHW) mode so the weights are compatible with TensorRT. When I do this, the ~80% training accuracy in the first few epochs dips down to around 0.020% and stays there permanently.
It is useful to know that the base of my model is a tf.keras.applications.MobileNet() model with the pre-trained 'imagenet' weights. (Model architecture at the bottom.)
The transformation process:
I used the guidelines provided and I change only a few things here:
Set tf.keras.backend.set_image_data_format() to 'channels_first'.
I change the channel order in the input tensor from: input_tensor=Input(shape=(376, 672, 3)) to: input_tensor=Input(shape=(3, 376, 672))
In my image preprocessing (using tf.data.Dataset), i use tf.transpose(img, perm=[2, 0, 1]) on both my input image and one-hot encoded mask to change the channel orders. I checked this with equality assertion to make sure its correct and it seems to be fine.
When I change these the training starts fine but as I said the training accuracy goes down to almost zero. When I revert back everything's fine again.
Possible leads:
What am I doing wrong or what could be the problematic part here? My suspicions are around these questions:
Are the pre-trained imageNet weights changed to the 'channels_first' order also when I set the backend? Is this something I should consider at all?
Could it be that the tf.transpose() function messes up the mask's one-hot encoding? (I have 3 classes represented by 3 colors: lane, opposing lane, background)
Maybe I am not seeing something obvious. I can provide further code and answers as needed.
EDIT:
08/17: This is still an ongoing issue, I have tried several things:
I checked if the image and the mask is correct after the transpose with numpy assertion, seems correct.
I suspected that the loss function calculates on the wrong axis, so I customized the loss function for the first axis (where the channels are). Here it is:
def ReverseAxisLoss(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True, axis=1)
My main suspicion is that the 'channels first' backend setting does nothing to transpose the pretrained 'imagenet' weights for the mobilenet part. Is there an updated way for TF2.x / Keras to transpose the pre-trained weights into CHW format?
Here is the architecture that I use (the skipNet() is the head network and the mobilenet is the base, and it is connected in the create_model() function)
def skipNet(encoder_output, feed1, feed2, classes):
# random initializer and regularizer
stddev = 0.01
init = RandomNormal(stddev=stddev)
weight_decay = 1e-3
reg = l2(weight_decay)
score_feed2 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed2)
score_feed2_bn = BatchNormalization()(score_feed2)
score_feed1 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed1)
score_feed1_bn = BatchNormalization()(score_feed1)
upscore2 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(encoder_output)
height_pad1 = ZeroPadding2D(padding=((1,0),(0,0)))(upscore2)
upscore2_bn = BatchNormalization()(height_pad1)
fuse_feed1 = add([score_feed1_bn, upscore2_bn])
upscore4 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(fuse_feed1)
height_pad2 = ZeroPadding2D(padding=((0,1),(0,0)))(upscore4)
upscore4_bn = BatchNormalization()(height_pad2)
fuse_feed2 = add([score_feed2_bn, upscore4_bn])
upscore8 = Conv2DTranspose(kernel_size=(16, 16), filters=classes, strides=(8, 8),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg, activation="softmax")(fuse_feed2)
return upscore8
def create_model(classes):
base_model = tf.keras.applications.MobileNet(input_tensor=Input(shape=IMG_SHAPE),
include_top=False,
weights='imagenet')
conv4_2_output = base_model.get_layer(index=43).output
conv3_2_output = base_model.get_layer(index=30).output
conv_score_output = base_model.output
head_model = skipNet(conv_score_output, conv4_2_output, conv3_2_output, classes)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs=base_model.input, outputs=head_model)
return model