Regularization function using weights from multiple layers? - tensorflow

I don't know if it is feasible but I'm asking just in case. Here is the (simplified) architecture of my model.
Layer (type) Output Shape Param #Connected to
==========================================
input_1 (InputLayer) [(None, 7, 7, 1024) 0
conv (Conv2D) (None, 7, 7, 10) 10240 input_1[0][0]
where each of the 10 filters in "conv" is a 1x1x1024 convolutional filter (with no bias but it's irrelevant for this particular issue).
I am currently using a custom regularization function on "conv" to make sure that the (1x1)x1024x10 matrix of filter weights has a nice property (basically that all vectors are pairwise orthogonal) and so far, everything is working as expected.
Now I also want the ability to disable training on some of these 10 filters. The only way I know how to do that would be to implement 10 filters independently as follows
Layer (type) Output Shape Param # Connected to
=========================================================
input_1 (InputLayer) [(None, 7, 7, 1024) 0
conv_1 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
conv_2 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
conv_3 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
...
conv_10 (Conv2D) (None, 7, 7, 1) 1024 input_1[0][0]
followed by a Concatenate layer, then to set the "trainable" parameter to True/False on each conv_i layer as I see fit. However, now I don't know how to implement my regularization function which must be computed on the weights of all layers conv_i simultaneously rather than independently.
Is there a trick that I can use to implement such function? Or conversely, is there a way to freeze only part of the weights of a convolutional layer?
Thanks!
Solution
For those interested, here is the working code for my problem following the advice provided by #LaplaceRicky.
class SpecialRegularization(tf.keras.Model):
""" In order to avoid a warning message when saving the model,
I use the solution indicated here
https://github.com/tensorflow/tensorflow/issues/44541
and now inherit from tf.keras.Model instead of Layer
"""
def __init__(self,nfilters,**kwargs):
super().__init__(**kwargs)
self.inner_layers=[Conv2D(1,(1,1)) for _ in range(nfilters)]
def call(self, inputs):
outputs=[l(inputs) for l in self.inner_layers]
self.add_loss(self.define_your_regularization_here())
return tf.concat(outputs,-1)
def set_trainable_parts(self, trainables):
""" Set the trainable attribute independently on each filter """
for l,t in zip(self.inner_layers,trainables):
l.trainable = t
def define_your_regularization_here(self):
#reconstruct the original kernel
large_kernel=tf.concat([l.kernel for l in self.inner_layers],-1)
return tf.reduce_sum(large_kernel*large_kernel[:,:,:,::-1])

One way to achieve this is to have a custom keras layer that wraps all of the small conv layers and is responsible for computing the regularization loss.
Example Codes:
import tensorflow as tf
def _get_losses(model,x):
model(x)
return model.losses
def _get_grads(model,x):
with tf.GradientTape() as t:
model(x)
reg_loss=tf.math.add_n(model.losses)
return t.gradient(reg_loss,model.trainable_weights)
class SpecialRegularization(tf.keras.layers.Layer):
def __init__(self, **kwargs):
self.inner_layers=[tf.keras.layers.Conv2D(1,(1,1)) for i in range(10)]
super().__init__(**kwargs)
def call(self, inputs,training=None):
outputs=[l(inputs,training=training) for l in self.inner_layers]
self.add_loss(self.define_your_regularization_here())
return tf.concat(outputs,-1)
def define_your_regularization_here(self):
#reconstruct the original kernel
large_kernel=tf.concat([l.kernel for l in self.inner_layers],-1)
#just giving an example here
#you should define your own regularization using the entire kernel
return tf.reduce_sum(large_kernel*large_kernel[:,:,:,::-1])
tf.random.set_seed(123)
inputs = tf.keras.Input(shape=(7,7,1024))
outputs = SpecialRegularization()(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
#get_losses, get_grads are for demonstration purpose
get_losses=tf.function(_get_losses)
get_grads=tf.function(_get_grads)
data=tf.random.normal((64,7,7,1024))
print(get_losses(model,data))
print(get_grads(model,data)[0])
print(model.layers[1].inner_layers[-1].kernel*2)
model.summary()
'''
[<tf.Tensor: shape=(), dtype=float32, numpy=-0.20446025>]
tf.Tensor(
[[[[ 0.02072023]
[ 0.12973154]
[ 0.11631528]
...
[ 0.00804012]
[-0.07299817]
[ 0.06031524]]]], shape=(1, 1, 1024, 1), dtype=float32)
tf.Tensor(
[[[[ 0.02072023]
[ 0.12973154]
[ 0.11631528]
...
[ 0.00804012]
[-0.07299817]
[ 0.06031524]]]], shape=(1, 1, 1024, 1), dtype=float32)
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 7, 7, 1024)] 0
_________________________________________________________________
special_regularization (Spec (None, 7, 7, 10) 10250
=================================================================
Total params: 10,250
Trainable params: 10,250
Non-trainable params: 0
_________________________________________________________________
'''

Related

Mobilenet: Transfer learning with Gradcam

I am a newbie to all this so please be kind to this question :)
What I am trying to do is train a Mobilenet classifier using the transfer learning technique and then implement the Gradcam technique to understand what my model is looking into.
I created a model
input_layer = tf.keras.layers.Input(shape=IMG_SHAPE)
x = preprocess_input(input_layer)
y = base_model(x)
y = tf.keras.layers.GlobalAveragePooling2D()(y)
y = tf.keras.layers.Dropout(0.2)(y)
outputs = tf.keras.layers.Dense(5)(y)
model = tf.keras.Model(inputs=input_layer, outputs=outputs)
model.summary()
model summary:
Model: "functional_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
tf_op_layer_RealDiv_1 (Tenso [(None, 224, 224, 3)] 0
_________________________________________________________________
tf_op_layer_Sub_1 (TensorFlo [(None, 224, 224, 3)] 0
_________________________________________________________________
mobilenetv2_1.00_224 (Functi (None, 7, 7, 1280) 2257984
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1280) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 1280) 0
_________________________________________________________________
dense_1 (Dense) (None, 5) 6405
=================================================================
Total params: 2,264,389
Trainable params: 6,405
Non-trainable params: 2,257,984
_________________________________________________________________
passed it to grad cam algorithm but the grad cam algorithm is not able to find the last convolutional layer
Plausible solution:
If instead of having an encapsulated 'mobilenetv2_1.00_224' layer if I can have unwrapped layers of mobilenet added in the model the grad cam algorithm will be able to find that last layer
Problem
I am not able to create the model where I can have data augmentation and pre_processing layer added to mobilenet unwrapped layers.
Thanks in advance
Regards
Ankit
#skruff see if this helps
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
# First, we create a model that maps the input image to the activations
# of the last conv layer as well as the output predictions
grad_model = tf.keras.models.Model(
[model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
)
# Then, we compute the gradient of the top predicted class for our input image
# with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
last_conv_layer_output, preds = grad_model(img_array)
if pred_index is None:
pred_index = tf.argmax(preds[0])
class_channel = preds[:, pred_index]
# This is the gradient of the output neuron (top predicted or chosen)
# with regard to the output feature map of the last conv layer
grads = tape.gradient(class_channel, last_conv_layer_output)
# This is a vector where each entry is the mean intensity of the gradient
# over a specific feature map channel
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the top predicted class
# then sum all the channels to obtain the heatmap class activation
last_conv_layer_output = last_conv_layer_output[0]
heatmap = last_conv_layer_output # pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
# For visualization purpose, we will also normalize the heatmap between 0 & 1
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
return heatmap.numpy()

How to show all layers in a Tensorflow model with nested model?

How to show all layers in a tensorflow model with the model base?
base_model = keras.applications.MobileNetV3Small(
input_shape=model_input_shape,
include_top=False,
weights="imagenet",
)
# =================== build model
model = keras.Sequential(
[
keras.Input(shape=image_shape),
preprocessing.Resizing(*model_input_shape[:2]),
preprocessing.Rescaling(1.0 / 255),
base_model,
layers.GlobalAveragePooling2D(),
# missing dropout
layers.Dense(1, activation="sigmoid"),
]
)
model.summary()
The output is this:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
resizing (Resizing) (None, 224, 224, 3) 0
_________________________________________________________________
rescaling_1 (Rescaling) (None, 224, 224, 3) 0
_________________________________________________________________
MobilenetV3small (Functional (None, 7, 7, 1024) 1529968 <---------- why can't I see all layers here?
_________________________________________________________________
global_average_pooling2d (Gl (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 1) 1025
How do I show all layers?
for layer in model.layers:
print(layer)
The above has the same problem. What am I doing wrong?
In such a setup, the base_model acts as a single layer, ie. become nested. To inspect it, you can try either
model.layers[2].summary()
for i, layer in enumerate(model.layers):
if i == 2:
for nested_layer in layer.layers:
print(nested_layer)
or, more intuitively, you can use this solution.
def summary_plus(layer, i=0):
if hasattr(layer, 'layers'):
if i != 0:
layer.summary()
for l in layer.layers:
i += 1
summary_plus(l, i=i)
summary_plus(model)
or, you can also use the plot_model function as well
keras.utils.plot_model(
model,
expand_nested=True # < make it true
)
Update 1: Raised on the issue regarding this. Keras #15239. Hopefully, it will be solved soon.
Update 2: model.summary now has expand_nested parameter. #15251

How to read Keras's model structure?

For example:
BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(train_dataset))
test_dataset = test_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(test_dataset))
def pad_to_size(vec, size):
zeros = [0] * (size - len(vec))
vec.extend(zeros)
return vec
...
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=False)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
print(model.summary())
The print reads as:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 64) 523840
_________________________________________________________________
bidirectional (Bidirectional (None, 128) 66048
_________________________________________________________________
dense (Dense) (None, 64) 8256
_________________________________________________________________
dense_1 (Dense) (None, 1) 65
=================================================================
Total params: 598,209
Trainable params: 598,209
Non-trainable params: 0
I have the following question:
1) For the embedding layer, why is the ouput shape is (None, None, 64). I understand '64' is the vector length. Why are the other two None?
2) How is the output shape of bidirectional layer is (None, 128)? Why is it 128?
For the embedding layer, why is the ouput shape is (None, None, 64). I understand '64' is the vector length. Why are the other two None?
You can see this function produces (None,None) (including the batch dimension) (in other words it does input_shape=(None,) as default) if you don't define the input_shape to the first layer of the Sequential model.
If you pass in an input tensor of size (None, None) to an embedding layer, it produces a (None, None, 64) tensor assuming embedding dimension is 64. The first None is the batch dimension and the second is the time dimension (refers to the input_length parameter). So that's why you get a (None, None, 64) sized output.
How is the output shape of bidirectional layer is (None, 128)? Why is it 128?
Here, you have a Bidirectional LSTM. Your LSTM layer produces a (None, 64) sized output (when return_sequences=False). When you have a Bidirectional layer it is like having two LSTM layers (one going forward, other going backwards). And you have a default merge_mode of concat meaning that the two output states from forward and backward layers will be concatenated. This gives you a (None, 128) sized output.

Tensorflow keras Sequential .add is different than inline definition?

Keras is giving different results when I define my model via the declarative method instead of the functional method. The two models appear to be equivillent, but using the ".add()" syntax works while using the declarative syntax gives errors -- it's a different error each time, but usually something like:
A target array with shape (10, 1) was passed for an output of shape (None, 16) while using as loss `mean_squared_error`. This loss expects targets to have the same shape as the output.
There seems to be something going on with auto-conversion of input shapes, but I can't tell what. Does anyone know what I'm doing wrong? Why aren't these two models exactly equivillent?
import tensorflow as tf
import tensorflow.keras
import numpy as np
x = np.arange(10).reshape((-1,1,1))
y = np.arange(10)
#This model works fine
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(1, 1), return_sequences = True))
model.add(tf.keras.layers.LSTM(16))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation('linear'))
#This model fails. But shouldn't this be equivalent to the above?
model2 = tf.keras.Sequential(
{
tf.keras.layers.LSTM(32, input_shape=(1, 1), return_sequences = True),
tf.keras.layers.LSTM(16),
tf.keras.layers.Dense(1),
tf.keras.layers.Activation('linear')
})
#This works
model.compile(loss='mean_squared_error', optimizer='adagrad')
model.fit(x, y, epochs=1, batch_size=1, verbose=2)
#But this doesn't! Why not? The error is different each time, but usually
#something about the input size being wrong
model2.compile(loss='mean_squared_error', optimizer='adagrad')
model2.fit(x, y, epochs=1, batch_size=1, verbose=2)
Why aren't those two models equivalent? Why does one handle the input size correctly but the other doesn't? The second model fails with a different error each time (once in a while it even works) so i thought maybe there's some interaction with the first model? But I've tried commenting out the first model and that doesn't help. So why doesn't the second one work?
UPDATE: Here is the "model.summary() for the first and second model. They do seem different but I don't understand why.
For model.summary():
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 1, 32) 4352
_________________________________________________________________
lstm_1 (LSTM) (None, 16) 3136
_________________________________________________________________
dense (Dense) (None, 1) 17
_________________________________________________________________
activation (Activation) (None, 1) 0
=================================================================
Total params: 7,505
Trainable params: 7,505
Non-trainable params: 0
For model2.summary():
model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1, 32) 4352
_________________________________________________________________
activation_1 (Activation) (None, 1, 32) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 16) 3136
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 7,505
Trainable params: 7,505
Non-trainable params: 0```
When you are creating the model with the inline declarations, you put the layers in curly braces {}, which makes it a set, which is inherently unordered. Change the curly braces to square brackets [] to put them in an ordered list. This will make sure that the layers are in the correct order in your model.

Understanding Keras model architecture (tensor index)

This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
print(model.summary())
yields the following output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
A_input (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
B_dense (Dense) (None, 20) 2020 A_input[0][0]
____________________________________________________________________________________________________
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
B_dense[0][0]
____________________________________________________________________________________________________
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
C_dense_shared[1][0]
====================================================================================================
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
layer_name[node_index][tensor_index]
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_7 (InputLayer) (None, 10) 0
____________________________________________________________________________________________________
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
____________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
lambda_7[0][1]
lambda_7[0][2]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
====================================================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
____________________________________________________________________________________________________