I've a pretrained network. I want read that model and change the shape of input layer. I've tried with following code:
import os
import tensorflow as tf
from tensorflow import keras
from google.colab import drive
drive.mount("/content/drive", force_remount=True )
new_model = tf.keras.models.load_model("/content/drive/My Drive/NonQuantRelu.h5")
Model: "functional_1"
Layer (type) Output Shape Param #
Input (InputLayer) [(None, 108, 1)] 0
ConvL1_Filters (Conv1D) (None, 98, 24) 264
I really don't want the None in the InputLayer, so I've tried to:
new_input_layer = keras.Input(batch_size=1, shape=(108,1),name="Input",dtype="float32",ragged=False,sparse=False)
TensorShape([1, 108, 1])
new_model.layers[0] = new_input_layer
Model: "functional_1"
Layer (type) Output Shape Param #
Input (InputLayer) [(None, 108, 1)] 0
ConvL1_Filters (Conv1D) (None, 98, 24) 264
Why Input layer is not changed?
Thank to everyone
I was able to replicate your issue using vgg16 network.
import tensorflow as tf
from google.colab import drive
model = tf.keras.models.load_model('/content/drive/MyDrive/vgg16.h5')
Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
Model: "vgg16"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 224, 224, 3)] 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
To remove first layer of the network, use pop as shown below
To add new input layer, you can run code as shown below
new_input_layer = tf.keras.Input(batch_size= 32, shape=(224,224,3))
new_output_layer = model(new_input_layer)
new_model = tf.keras.Model(new_input_layer, new_output_layer)
Model: "model"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(32, 224, 224, 3)] 0
vgg16 (Functional) (None, 7, 7, 512) 14714688
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
You can use get_layer to retrieve a layer. Here to get vgg16 (Functional) layer (i.e. indexed at 1 in the new_model) details, you can run code as shown below
Model: "vgg16"
Layer (type) Output Shape Param #
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
I'm currently working on a Visual Question Answering subject.
I've made a model as follow :
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) [(None, 224, 224, 3) 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 input_3[0][0]
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 block1_conv1[0][0]
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 block1_conv2[0][0]
block2_conv1 (Conv2D) (None, 112, 112, 128 73856 block1_pool[0][0]
block2_conv2 (Conv2D) (None, 112, 112, 128 147584 block2_conv1[0][0]
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 block2_conv2[0][0]
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 block2_pool[0][0]
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 block3_conv1[0][0]
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 block3_conv2[0][0]
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080 block3_conv3[0][0]
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 block3_conv4[0][0]
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 block3_pool[0][0]
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 block4_conv1[0][0]
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 block4_conv2[0][0]
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808 block4_conv3[0][0]
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 block4_conv4[0][0]
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 block4_pool[0][0]
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv1[0][0]
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv2[0][0]
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv3[0][0]
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 block5_conv4[0][0]
flatten_1 (Flatten) (None, 25088) 0 block5_pool[0][0]
input_4 (InputLayer) [(None, 20)] 0
repeat_vector_1 (RepeatVector) (None, 20, 25088) 0 flatten_1[0][0]
embedding_1 (Embedding) (None, 20, 50) 901900 input_4[0][0]
concatenate_1 (Concatenate) (None, 20, 25138) 0 repeat_vector_1[0][0]
bidirectional_1 (Bidirectional) (None, 20, 50) 5032800 concatenate_1[0][0]
global_max_pooling1d_1 (GlobalM (None, 50) 0 bidirectional_1[0][0]
dense_1 (Dense) (None, 18037) 919887 global_max_pooling1d_1[0][0]
You can find the original paper on VQA, here :
To sum up I have a Model with 2 inputs
a pre-trained VGG19 that take images
and an embedded layer that take tokenized question.
As output we have a Bidirectional LSTM with a final Dense layer that give the answer to the question.
The training data is at follow :
img_path question answer
103 train2014/COCO_train2014_000000262171.jpg How many people are on the boat? 5
104 train2014/COCO_train2014_000000262171.jpg What color are the leaves? green
105 train2014/COCO_train2014_000000262171.jpg What type of watercraft is that? raft
131 train2014/COCO_train2014_000000262180.jpg What is the fruit? banana
132 train2014/COCO_train2014_000000262180.jpg Is this a good dessert?
My question is this one : I can't charge all the images in memory, I will like to know if it is possible to fit the model using a generator to generate the image on the fly + the tokenize question ?
I would like to do something like :
h =[X_train_img_generator, X_train_question], y_train_answer, epochs = 15, batch_size = 32)
where : X_train_question are the tokenized questions and X_train_img_generator the image generator.
--> But it doesn't work, is there a way to handle this properly ?
---------- Edit June 02 2021
Ok I've now update the answer to my problem and also correct some issu regarding input image size is now 480x640x3
Layer (type) Output Shape Param # Connected to
input_8 (InputLayer) [(None, 480, 640, 3) 0
block1_conv1 (Conv2D) (None, 480, 640, 64) 1792 input_8[0][0]
block1_conv2 (Conv2D) (None, 480, 640, 64) 36928 block1_conv1[0][0]
block1_pool (MaxPooling2D) (None, 240, 320, 64) 0 block1_conv2[0][0]
block2_conv1 (Conv2D) (None, 240, 320, 128 73856 block1_pool[0][0]
block2_conv2 (Conv2D) (None, 240, 320, 128 147584 block2_conv1[0][0]
block2_pool (MaxPooling2D) (None, 120, 160, 128 0 block2_conv2[0][0]
block3_conv1 (Conv2D) (None, 120, 160, 256 295168 block2_pool[0][0]
block3_conv2 (Conv2D) (None, 120, 160, 256 590080 block3_conv1[0][0]
block3_conv3 (Conv2D) (None, 120, 160, 256 590080 block3_conv2[0][0]
block3_conv4 (Conv2D) (None, 120, 160, 256 590080 block3_conv3[0][0]
block3_pool (MaxPooling2D) (None, 60, 80, 256) 0 block3_conv4[0][0]
block4_conv1 (Conv2D) (None, 60, 80, 512) 1180160 block3_pool[0][0]
block4_conv2 (Conv2D) (None, 60, 80, 512) 2359808 block4_conv1[0][0]
block4_conv3 (Conv2D) (None, 60, 80, 512) 2359808 block4_conv2[0][0]
block4_conv4 (Conv2D) (None, 60, 80, 512) 2359808 block4_conv3[0][0]
block4_pool (MaxPooling2D) (None, 30, 40, 512) 0 block4_conv4[0][0]
block5_conv1 (Conv2D) (None, 30, 40, 512) 2359808 block4_pool[0][0]
block5_conv2 (Conv2D) (None, 30, 40, 512) 2359808 block5_conv1[0][0]
block5_conv3 (Conv2D) (None, 30, 40, 512) 2359808 block5_conv2[0][0]
block5_conv4 (Conv2D) (None, 30, 40, 512) 2359808 block5_conv3[0][0]
block5_pool (MaxPooling2D) (None, 15, 20, 512) 0 block5_conv4[0][0]
flatten_7 (Flatten) (None, 153600) 0 block5_pool[0][0]
input_quest (InputLayer) [(None, None)] 0
repeat_vector_7 (RepeatVector) (None, 20, 153600) 0 flatten_7[0][0]
embedding_7 (Embedding) (None, 20, 50) 540700 input_quest[0][0]
concatenate_7 (Concatenate) (None, 20, 153650) 0 repeat_vector_7[0][0]
bidirectional_7 (Bidirectional) (None, 20, 22) 13522256 concatenate_7[0][0]
global_max_pooling1d_7 (GlobalM (None, 22) 0 bidirectional_7[0][0]
dense_7 (Dense) (None, 7465) 171695 global_max_pooling1d_7[0][0]
And the dataset as :
def load(file_path):
img =
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = preprocess_input(img)
#img = tf.image.resize(img, size=(224, 224))
img /= 255.
img = tf.expand_dims(img, axis = 0)
return img
x1 = xx: load(xx))
x2 =
y =
dataset =, x2), y))
h = = dataset, batch_size = 32, shuffle = True, epochs = 15)
but I get the following error :
ValueError: Dimension 0 in both shapes must be equal, but are 1 and 20. Shapes are [1,20] and [20,1]. for '{{node model_8/concatenate_7/concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](model_8/repeat_vector_7/Tile, model_8/embedding_7/embedding_lookup/Identity_1, model_8/concatenate_7/concat/axis)' with input shapes: [1,20,153600], [20,1,50], [] and with computed input tensors: input[2] = <2>.
I guess it has to do with the input shape for the Embedding part, but I don't what I missing
My Inputs data shape are
X_train_rnn_pad = (53607,20),
answer_tr = (53607, 7465)
Yes, you should use something like the API
if you have a 2 input model, you will do something like this:
x =, questions_array))
y =
dataset =, y)).shuffle(50)
After that, you can use the .map method to load your images and also apply data augmentation if you want.
Then for fitting you simply do:
h =
Using this API avoids loading images in memory.
Is it possible to get the pre activation outputs of conv4_1 layer of VGG19, before the activation function?
The VGG19 network from Keras Applications has the following layers:
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 224, 224, 3) 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
flatten (Flatten) (None, 25088) 0
fc1 (Dense) (None, 4096) 102764544
fc2 (Dense) (None, 4096) 16781312
predictions (Dense) (None, 1000) 4097000
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
I am assuming that the outputs of these are post activation (relu).
Yes, you are right, the activation functions are built into the layers here, so you cannot access the output before activation, unless you are willing to do some work:
First, make an exact copy of the layer whose output you want to observe, just leave out the activation function. If I understand you correctly, you want block4_conv1, which is at index 12. Inspect it's config like this:
>>> vgg.layers[12].name
>>> vgg.layers[12].filters
>>> vgg.layers[12].kernel_size
(3, 3)
>>> vgg.layers[12].padding
Create a copy of this layer without activation:
block4_conv1_copy = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
Now, create a new model consisting of all the vgg-layers 0-11 plus the copy of layer 12:
injection_model = Sequential(vgg.layers[:12] + [block4_conv1_copy])
This should yield
Layer (type) Output Shape Param #
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
conv2d_2 (Conv2D) (None, 28, 28, 512) 1180160
Total params: 3,505,728
Trainable params: 3,505,728
Non-trainable params: 0
Now block4_conv1_copy knows its input shape, which is required to set the weights:
That should be it! Just call injection_model.predict(some_input) and you can observe the output of layer 12 before activation.
My rcnn model is too big near 1Gb, when I save_weights(). I want to reduce size of it.
I use loop to imitate simple rnn, but inputs are different. And I need all the steps for output in stack to be able to calculate total loss for every step. I tried to rewrite it with time distributed layers, but I didn't succeed. Do you have any suggestions?
x_input = tf.keras.layers.Input((shape[1],shape[2], const.num_channels),name='x_input')
y_init = tf.keras.layers.Input((const.num_patches,2),name='y_init')
dxs = []
for i in range(const.num_iters_rnn):
if i is 0:
patches = tf.keras.layers.Lambda(extract_patches)([x_input,y_init])
patches = tf.keras.layers.Lambda(extract_patches)([x_input,dxs[i-1]])
conv2d1 = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(patches)
maxpool1 = tf.keras.layers.MaxPooling2D()(conv2d1)
conv2d2 = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(maxpool1)
maxpool2 = tf.keras.layers.MaxPooling2D()(conv2d2)
crop = tf.keras.layers.Cropping2D(cropping=(const.crop_size, const.crop_size))(conv2d2)
cnn = tf.keras.layers.concatenate([crop,maxpool2])
cnn = tf.keras.layers.Lambda(reshape)(cnn)
if i is 0:
hidden_state = tf.keras.layers.Dense(const.numNeurons,activation='tanh')(cnn)
concat = tf.keras.layers.concatenate([cnn,hidden_state],axis=1)
hidden_state = tf.keras.layers.Dense(const.numNeurons,activation='tanh')(concat)
hidden_state = tf.keras.layers.BatchNormalization()(hidden_state)
prediction = tf.keras.layers.Dense(const.num_patches*2,activation=None)(hidden_state)
prediction = tf.keras.layers.Dropout(0.5)(prediction)
prediction_reshape = tf.keras.layers.Reshape((const.num_patches, 2))(prediction)
if i is 0:
prediction = tf.keras.layers.Add()([prediction_reshape, y_init])
prediction = tf.keras.layers.Add()([prediction_reshape,dxs[i-1]])
output = tf.keras.layers.Lambda(stack)(dxs)
model = tf.keras.models.Model(inputs=[x_input, y_init], outputs=[output])
def extract_patches(inputs):
list_patches = []
for j in range(const.num_patches):
patch_one = tf.image.extract_glimpse(inputs[0], [const.size_patch[0], const.size_patch[1]], inputs[1][:, j, :], centered=False, normalized=False, noise='zero')
patches = tf.keras.backend.stack(list_patches,1)
return tf.keras.backend.reshape(patches,(-1,patches.shape[2],patches.shape[3],patches.shape[4]))
def reshape(inputs):
return tf.keras.backend.reshape(inputs,(-1,const.num_patches*inputs.shape[1]*inputs.shape[2]*inputs.shape[3]))
def stack(inputs):
return tf.keras.backend.stack(inputs)
Model: "model"
Layer (type) Output Shape Param # Connected to
x_input (InputLayer) [(None, 255, 235, 1) 0
y_init (InputLayer) [(None, 52, 2)] 0
lambda (Lambda) (None, 26, 26, 1) 0 x_input[0][0]
conv2d (Conv2D) (None, 26, 26, 32) 320 lambda[0][0]
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 conv2d[0][0]
conv2d_1 (Conv2D) (None, 13, 13, 32) 9248 max_pooling2d[0][0]
cropping2d (Cropping2D) (None, 6, 6, 32) 0 conv2d_1[0][0]
max_pooling2d_1 (MaxPooling2D) (None, 6, 6, 32) 0 conv2d_1[0][0]
concatenate (Concatenate) (None, 6, 6, 64) 0 cropping2d[0][0]
lambda_1 (Lambda) (None, 119808) 0 concatenate[0][0]
dense (Dense) (None, 512) 61342208 lambda_1[0][0]
batch_normalization (BatchNorma (None, 512) 2048 dense[0][0]
dense_1 (Dense) (None, 104) 53352 batch_normalization[0][0]
dropout (Dropout) (None, 104) 0 dense_1[0][0]
reshape (Reshape) (None, 52, 2) 0 dropout[0][0]
add (Add) (None, 52, 2) 0 reshape[0][0]
lambda_2 (Lambda) (None, 26, 26, 1) 0 x_input[0][0]
conv2d_2 (Conv2D) (None, 26, 26, 32) 320 lambda_2[0][0]
max_pooling2d_2 (MaxPooling2D) (None, 13, 13, 32) 0 conv2d_2[0][0]
conv2d_3 (Conv2D) (None, 13, 13, 32) 9248 max_pooling2d_2[0][0]
cropping2d_1 (Cropping2D) (None, 6, 6, 32) 0 conv2d_3[0][0]
max_pooling2d_3 (MaxPooling2D) (None, 6, 6, 32) 0 conv2d_3[0][0]
concatenate_1 (Concatenate) (None, 6, 6, 64) 0 cropping2d_1[0][0]
lambda_3 (Lambda) (None, 119808) 0 concatenate_1[0][0]
concatenate_2 (Concatenate) (None, 120320) 0 lambda_3[0][0]
dense_2 (Dense) (None, 512) 61604352 concatenate_2[0][0]
batch_normalization_1 (BatchNor (None, 512) 2048 dense_2[0][0]
dense_3 (Dense) (None, 104) 53352 batch_normalization_1[0][0]
dropout_1 (Dropout) (None, 104) 0 dense_3[0][0]
reshape_1 (Reshape) (None, 52, 2) 0 dropout_1[0][0]
add_1 (Add) (None, 52, 2) 0 reshape_1[0][0]
lambda_4 (Lambda) (None, 26, 26, 1) 0 x_input[0][0]
conv2d_4 (Conv2D) (None, 26, 26, 32) 320 lambda_4[0][0]
max_pooling2d_4 (MaxPooling2D) (None, 13, 13, 32) 0 conv2d_4[0][0]
conv2d_5 (Conv2D) (None, 13, 13, 32) 9248 max_pooling2d_4[0][0]
cropping2d_2 (Cropping2D) (None, 6, 6, 32) 0 conv2d_5[0][0]
max_pooling2d_5 (MaxPooling2D) (None, 6, 6, 32) 0 conv2d_5[0][0]
concatenate_3 (Concatenate) (None, 6, 6, 64) 0 cropping2d_2[0][0]
lambda_5 (Lambda) (None, 119808) 0 concatenate_3[0][0]
concatenate_4 (Concatenate) (None, 120320) 0 lambda_5[0][0]
dense_4 (Dense) (None, 512) 61604352 concatenate_4[0][0]
batch_normalization_2 (BatchNor (None, 512) 2048 dense_4[0][0]
dense_5 (Dense) (None, 104) 53352 batch_normalization_2[0][0]
dropout_2 (Dropout) (None, 104) 0 dense_5[0][0]
reshape_2 (Reshape) (None, 52, 2) 0 dropout_2[0][0]
add_2 (Add) (None, 52, 2) 0 reshape_2[0][0]
lambda_6 (Lambda) (None, 26, 26, 1) 0 x_input[0][0]
conv2d_6 (Conv2D) (None, 26, 26, 32) 320 lambda_6[0][0]
max_pooling2d_6 (MaxPooling2D) (None, 13, 13, 32) 0 conv2d_6[0][0]
conv2d_7 (Conv2D) (None, 13, 13, 32) 9248 max_pooling2d_6[0][0]
cropping2d_3 (Cropping2D) (None, 6, 6, 32) 0 conv2d_7[0][0]
max_pooling2d_7 (MaxPooling2D) (None, 6, 6, 32) 0 conv2d_7[0][0]
concatenate_5 (Concatenate) (None, 6, 6, 64) 0 cropping2d_3[0][0]
lambda_7 (Lambda) (None, 119808) 0 concatenate_5[0][0]
concatenate_6 (Concatenate) (None, 120320) 0 lambda_7[0][0]
dense_6 (Dense) (None, 512) 61604352 concatenate_6[0][0]
batch_normalization_3 (BatchNor (None, 512) 2048 dense_6[0][0]
dense_7 (Dense) (None, 104) 53352 batch_normalization_3[0][0]
dropout_3 (Dropout) (None, 104) 0 dense_7[0][0]
reshape_3 (Reshape) (None, 52, 2) 0 dropout_3[0][0]
add_3 (Add) (None, 52, 2) 0 reshape_3[0][0]
lambda_8 (Lambda) (4, None, 52, 2) 0 add[0][0]
Total params: 246,415,136
Trainable params: 246,411,040
Non-trainable params: 4,096
you must decease your model size because for this model 1GB is reasonable but there are some solutions that do this in a way that final accuracy is not decreasing and also it increase in some cases. you can search for improving neural network with pruning.
I am trying to build a network for detecting 68 landmarks (x, y) on face. The training and validation images are 320x320x3 normalized between -0.5 and 0.5. My labels are 136 logits, each between 0 to 1.0 corresponding to X->(0, 320); Y->(0, 320). The loss function is keras "root_mean_square". Number of images in training dataset is about 5k. While training, my training and validation loss starts at about 6.0 and decreases to about 0.0022 in 100 iterations but then seems to be saturated at that level and does not go any lower. I have tried up to 2000 iterations. Looking at the output, it seems like the network learns to output 68 points in the shape of a face at the center of frame irrespective of where the really is.
I am using generator for getting the data and using sklearn.utils.shuffle() to make sure my data is shuffled properly.
Some posts suggested that the network could be overfitting because it is so complex for such a simple problem, so I have tried both a very simple network with about 10 layers and complex network of about 20 layers and my result is still the same. My current network is shown below, I have used 2 skip connections, 3 dropouts, and l2 regularizer to make sure that my network does not overfit. Underfitting should not be an issue because I have tried training my network for up to 2000 iterations.
Any suggestions on how to resolve this issue is greatly appreciated. Thanks!
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 320, 320, 3) 0
conv2d_1 (Conv2D) (None, 320, 320, 3) 228 input_1[0][0]
batch_normalization_1 (BatchNor (None, 320, 320, 3) 12 conv2d_1[0][0]
activation_1 (Activation) (None, 320, 320, 3) 0 batch_normalization_1[0][0]
max_pooling2d_1 (MaxPooling2D) (None, 160, 160, 3) 0 activation_1[0][0]
conv2d_2 (Conv2D) (None, 160, 160, 8) 608 max_pooling2d_1[0][0]
batch_normalization_2 (BatchNor (None, 160, 160, 8) 32 conv2d_2[0][0]
activation_2 (Activation) (None, 160, 160, 8) 0 batch_normalization_2[0][0]
max_pooling2d_2 (MaxPooling2D) (None, 80, 80, 8) 0 activation_2[0][0]
conv2d_3 (Conv2D) (None, 80, 80, 16) 1168 max_pooling2d_2[0][0]
conv2d_4 (Conv2D) (None, 80, 80, 16) 2320 conv2d_3[0][0]
batch_normalization_3 (BatchNor (None, 80, 80, 16) 64 conv2d_4[0][0]
activation_3 (Activation) (None, 80, 80, 16) 0 batch_normalization_3[0][0]
max_pooling2d_4 (MaxPooling2D) (None, 40, 40, 16) 0 activation_3[0][0]
conv2d_5 (Conv2D) (None, 40, 40, 32) 4640 max_pooling2d_4[0][0]
conv2d_6 (Conv2D) (None, 40, 40, 32) 9248 conv2d_5[0][0]
conv2d_7 (Conv2D) (None, 40, 40, 32) 9248 conv2d_6[0][0]
batch_normalization_4 (BatchNor (None, 40, 40, 32) 128 conv2d_7[0][0]
activation_4 (Activation) (None, 40, 40, 32) 0 batch_normalization_4[0][0]
max_pooling2d_6 (MaxPooling2D) (None, 20, 20, 32) 0 activation_4[0][0]
dropout_1 (Dropout) (None, 20, 20, 32) 0 max_pooling2d_6[0][0]
conv2d_8 (Conv2D) (None, 20, 20, 64) 18496 dropout_1[0][0]
conv2d_9 (Conv2D) (None, 20, 20, 64) 36928 conv2d_8[0][0]
max_pooling2d_3 (MaxPooling2D) (None, 20, 20, 16) 0 conv2d_4[0][0]
conv2d_10 (Conv2D) (None, 20, 20, 64) 36928 conv2d_9[0][0]
concatenate_1 (Concatenate) (None, 20, 20, 80) 0 max_pooling2d_3[0][0]
batch_normalization_5 (BatchNor (None, 20, 20, 80) 320 concatenate_1[0][0]
activation_5 (Activation) (None, 20, 20, 80) 0 batch_normalization_5[0][0]
max_pooling2d_7 (MaxPooling2D) (None, 10, 10, 80) 0 activation_5[0][0]
dropout_2 (Dropout) (None, 10, 10, 80) 0 max_pooling2d_7[0][0]
conv2d_11 (Conv2D) (None, 10, 10, 128) 92288 dropout_2[0][0]
conv2d_12 (Conv2D) (None, 10, 10, 128) 147584 conv2d_11[0][0]
max_pooling2d_5 (MaxPooling2D) (None, 10, 10, 32) 0 conv2d_7[0][0]
conv2d_13 (Conv2D) (None, 10, 10, 128) 147584 conv2d_12[0][0]
concatenate_2 (Concatenate) (None, 10, 10, 160) 0 max_pooling2d_5[0][0]
batch_normalization_6 (BatchNor (None, 10, 10, 160) 640 concatenate_2[0][0]
activation_6 (Activation) (None, 10, 10, 160) 0 batch_normalization_6[0][0]
max_pooling2d_8 (MaxPooling2D) (None, 5, 5, 160) 0 activation_6[0][0]
dropout_3 (Dropout) (None, 5, 5, 160) 0 max_pooling2d_8[0][0]
flatten_2 (Flatten) (None, 4000) 0 dropout_3[0][0]
dense_3 (Dense) (None, 1024) 4097024 flatten_2[0][0]
dense_4 (Dense) (None, 136) 139400 dense_3[0][0]
Total params: 4,744,888
Trainable params: 4,744,290
Non-trainable params: 598