Summary of models constructed for transfer learning in tensorflow keras - tensorflow

I'm using tensorflow 2.6 keras for transfer learning. Currently I take MobileNetV2. I take input, apply some preprocessing using Lambda layer, then feed this preprocessed input to MobileNetV2, then add Dense layer and train this thing. Training, inference etc actually work as expected.
However, the summary of the model looks as follows:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) [(None, 201, 189, 1)] 0
_________________________________________________________________
lambda (Lambda) (None, 201, 189, None) 0
_________________________________________________________________
lambda_1 (Lambda) (None, 201, 189, None) 0
_________________________________________________________________
mobilenetv2_1.00_224 (Functi (None, 7, 6, 1280) 2257984
_________________________________________________________________
flatten (Flatten) (None, 53760) 0
_________________________________________________________________
output (Dense) (None, 2) 107522
=================================================================
Total params: 2,365,506
Trainable params: 2,331,394
Non-trainable params: 34,112
So the MobileNetV2 structure is hidden and shown as one layer of type tensorflow.python.keras.engine.functional.Functional. If I print summary of this layer, I get all the internal layers of the model. I have a script for automatic GradCam visualizations which looks for the last Conv layer of the model. If the model is constructed by hand using Lambda, Conv2D, Dense layers, then everyhting works fine. If I use pretrained model, then currently it fails, because the Conv layer is hidden inside of this Functional layer.
How do I construct my modified MobileNetV2 model with my additional layers so that the full structure of the model is shown?
This is how I approximately construct my final model:
input = Input(shape=params.image_shape, name="input")
flow = input
flow = input_correction(flow, params) #some Lambda layers
keras_model = MobileNetV2(
input_shape=image_shape,
weights='imagenet',
include_top=False)
keras_model_output=keras_model(flow)
keras_model_input=input
keras_model_output = Flatten()(keras_model_output)
output = Dense(units=len(params.classes),
activation=tf.nn.softmax,
name="output")(keras_model_output)
model = Model(inputs=keras_model_input, outputs=output)
model.compile(...)

In default, summary doesnt show nested models. Just include expand_nested argument in the summary.
model.summary(expand_nested=True)

Related

Access to intermediate layers in Keras Functional Model

I am using a transfer learning model is a ay very similar to that explained in Chollet's keras Transfer learning guide. To avoid problems with the batch normalization layer, as stated in the guide and many other places, I have to insert the original pretrained base model as a functional model with the training=false option like this:
inputs = layers.Input(shape=(224,224, 3))
x = img_augmentation(inputs)
baseModel = VGG19(weights="imagenet", include_top=False,input_tensor=x)
x=baseModel(x,training=False)
# construct the head of the model that will be placed on top of the
# the base model
x=Conv2D(32,2)(x)
headModel = AveragePooling2D(pool_size=(4, 4))(x)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(64, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(3, activation="softmax")(headModel)
model = Model(inputs, outputs=headModel)
My problem is that I need to use gradcam as in Chollet's gradcam example page. To do this I need access to the basemodel last convolutional layer but when I summarize my model I get:
Model: "model_163"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
img_augmentation (Sequential (None, 224, 224, 3) 0
_________________________________________________________________
vgg19 (Functional) (None, 7, 7, 512) 20024384
_________________________________________________________________
conv2d_2 (Conv2D) (None, 6, 6, 32) 65568
_________________________________________________________________
average_pooling2d_2 (Average (None, 1, 1, 32) 0
_________________________________________________________________
flatten (Flatten) (None, 32) 0
_________________________________________________________________
dense_4 (Dense) (None, 64) 2112
_________________________________________________________________
dropout_2 (Dropout) (None, 64) 0
_________________________________________________________________
dense_5 (Dense) (None, 3) 195
=================================================================
Total params: 20,092,259
Trainable params: 67,875
Non-trainable params: 20,024,384
__________________________________________
Thus, the outputs I need are inside one of the vgg19 functional model layers. How can I access this layer without having to remove the training=True option?
I generally don't like nesting models in models. Although it encourages modularity and introduce nice structure to complex models, TensorFlow gives trouble when you want to do unconventional things (like computing GradCAM or accessing gradients, etc.). I've found it easier to un-nest the model so that you can access the layer that you like easily.
I recently wrote a tutorial to implement GradCAM
on TensorFlow 2 for InceptionNet. It should give you enough context to access the required layer.
So as you see the VGG model in your case has type Functional. When you iterate through your compound model's layers you can check for the type of each layer, like this, find the nested Functional model and work with it's layers:
for layer in model.layers:
if "Functional" == layer.__class__.__name__:
#here you can iterate and choose the layers of your nested model
for _layer in layer.layers:
# your logic with nested model layers

Getting intermediate layer output from a nested network - Keras

I have a U-net network with VGG16 encoder architecture with pre-trained imagenet weights. Since my input images are grayscale, I added in a convolutional layer with depth 3 prior to sending the input to the U-net model.
Now, I'm trying to get the output of an intermediate layer within the U-net network. I create an intermediate model whose output is the output of the layer that I'm interested in. Here is my code:
base_model = sm.Unet('vgg16', encoder_weights='imagenet', classes=1, activation='sigmoid')
inp = Input(shape=(448, 224, 1))
l1 = Conv2D(3, (1,1))(inp)
out = base_model(l1)
model = Model(inp, out)
model.summary()
intermediate_layer_model = Model(inputs=model.layers[0].input,
outputs=model.get_layer('model_1').get_layer('center_block2_relu').output)
Here is the output:
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 448, 224, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 448, 224, 3) 6
_________________________________________________________________
model_1 (Model) multiple 23752273
=================================================================
Total params: 23,752,279
Trainable params: 23,748,247
Non-trainable params: 4,032
_________________________________________________________________
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, ?, ?, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
It seems to me that there is an issue with the U-net model having an input layer (input_1) and I'm not supplying this information during the construction of intermediate_layer_model. However, I expect that the intermediate model to take only the grayscale images as input and not require an additional 3-channel input.
Any help would be appreciated.

Purpose of additional parameters in Quantization Nodes of TensorFlow Quantization Aware Training

Currently, I am trying to understand quantization aware training in TensorFlow. I understand, that fake quantization nodes are required to gather dynamic range information as a calibration for the quantization operation. When I compare the same model once as "plain" Keras model and once as quantization aware model, the latter has more parameters, which makes sense since we need to store the minimum and maximum values for activations during the quantization aware training.
Consider the following example:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model
def get_model(in_shape):
inpt = layers.Input(shape=in_shape)
dense1 = layers.Dense(256, activation="relu")(inpt)
dense2 = layers.Dense(128, activation="relu")(dense1)
out = layers.Dense(10, activation="softmax")(dense2)
model = Model(inpt, out)
return model
The model has the following summary:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 784)] 0
_________________________________________________________________
dense_3 (Dense) (None, 256) 200960
_________________________________________________________________
dense_4 (Dense) (None, 128) 32896
_________________________________________________________________
dense_5 (Dense) (None, 10) 1290
=================================================================
Total params: 235,146
Trainable params: 235,146
Non-trainable params: 0
_________________________________________________________________
However, if i make my model optimization aware, it prints the following summary:
import tensorflow_model_optimization as tfmot
quantize_model = tfmot.quantization.keras.quantize_model
# q_aware stands for for quantization aware.
q_aware_model = quantize_model(standard)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 784)] 0
_________________________________________________________________
quantize_layer (QuantizeLaye (None, 784) 3
_________________________________________________________________
quant_dense_3 (QuantizeWrapp (None, 256) 200965
_________________________________________________________________
quant_dense_4 (QuantizeWrapp (None, 128) 32901
_________________________________________________________________
quant_dense_5 (QuantizeWrapp (None, 10) 1295
=================================================================
Total params: 235,164
Trainable params: 235,146
Non-trainable params: 18
_________________________________________________________________
I have two questions in particular:
What is the purpose of the quantize_layer with 3 parameters after the Input layer?
Why do we have 5 additional non-trainable parameters per layer and what are they used for exactly?
I appreciate any hint or further material that helps me (and others that stumble upon this question) understand quantization aware training.
The quantize layer is used to convert the float inputs to int8. The quantization parameters are used for output min/max and zero point calculations.
Quantized Dense Layers need a few additional parameters: min/max for kernel and min/max/zero-point for output activations.

Tensorflow keras Sequential .add is different than inline definition?

Keras is giving different results when I define my model via the declarative method instead of the functional method. The two models appear to be equivillent, but using the ".add()" syntax works while using the declarative syntax gives errors -- it's a different error each time, but usually something like:
A target array with shape (10, 1) was passed for an output of shape (None, 16) while using as loss `mean_squared_error`. This loss expects targets to have the same shape as the output.
There seems to be something going on with auto-conversion of input shapes, but I can't tell what. Does anyone know what I'm doing wrong? Why aren't these two models exactly equivillent?
import tensorflow as tf
import tensorflow.keras
import numpy as np
x = np.arange(10).reshape((-1,1,1))
y = np.arange(10)
#This model works fine
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(1, 1), return_sequences = True))
model.add(tf.keras.layers.LSTM(16))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation('linear'))
#This model fails. But shouldn't this be equivalent to the above?
model2 = tf.keras.Sequential(
{
tf.keras.layers.LSTM(32, input_shape=(1, 1), return_sequences = True),
tf.keras.layers.LSTM(16),
tf.keras.layers.Dense(1),
tf.keras.layers.Activation('linear')
})
#This works
model.compile(loss='mean_squared_error', optimizer='adagrad')
model.fit(x, y, epochs=1, batch_size=1, verbose=2)
#But this doesn't! Why not? The error is different each time, but usually
#something about the input size being wrong
model2.compile(loss='mean_squared_error', optimizer='adagrad')
model2.fit(x, y, epochs=1, batch_size=1, verbose=2)
Why aren't those two models equivalent? Why does one handle the input size correctly but the other doesn't? The second model fails with a different error each time (once in a while it even works) so i thought maybe there's some interaction with the first model? But I've tried commenting out the first model and that doesn't help. So why doesn't the second one work?
UPDATE: Here is the "model.summary() for the first and second model. They do seem different but I don't understand why.
For model.summary():
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 1, 32) 4352
_________________________________________________________________
lstm_1 (LSTM) (None, 16) 3136
_________________________________________________________________
dense (Dense) (None, 1) 17
_________________________________________________________________
activation (Activation) (None, 1) 0
=================================================================
Total params: 7,505
Trainable params: 7,505
Non-trainable params: 0
For model2.summary():
model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1, 32) 4352
_________________________________________________________________
activation_1 (Activation) (None, 1, 32) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 16) 3136
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 7,505
Trainable params: 7,505
Non-trainable params: 0```
When you are creating the model with the inline declarations, you put the layers in curly braces {}, which makes it a set, which is inherently unordered. Change the curly braces to square brackets [] to put them in an ordered list. This will make sure that the layers are in the correct order in your model.

how to save custom trained model without full connect layer just like MobileNetV2 include_top=False

i want to save my trained model to .h5 without last two layers, in order to transfer learning using my custom model in the furture, just like MobileNetV2 include_top=False, can someone help me, thanks!
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(
alpha=1.0,
input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(255, activation=tf.nn.softmax)
])
trained model like this:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenetv2_1.00_224 (Model) (None, 2, 2, 1280) 2257984
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 205) 262605
=================================================================
Total params: 2,520,589
Trainable params: 2,486,477
Non-trainable params: 34,112
_________________________________________________________________
when i try to using it for transfer learning
keras_model = loadModel(keras_model_path)
keras_model.summary()
input = keras_model.input
hidden = tf.keras.layers.GlobalMaxPooling2D()(keras_model.layers[-3].output)
out = tf.keras.layers.Dense(128, activation=tf.nn.softmax)(hidden)
model2 = tf.keras.Model(input, out)
model2.summary()
an error occurs
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 64, 64, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
i want to save my trained model to .h5 without last two layers,
why don't you save the full model with model.save() and when you reload it for transfer learning, just remove the layers using:
model.layers.pop()
You can also remove the layers before saving the model but I wouldn't do that