I'm having a really weird problem.
I'm building same model in 2 different ways.
I checked the summary (number of parameters) and plot the 2 models, and see no difference.
The models give different predictions (after train them on same dataset).
What is the difference in the models ? (I can't figure it out)
How can I update the second model to be same as the first model ?
first model (the "source" model):
input_img = Input(shape=(dim_x, dim_y, dim_z))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoder = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoder)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoder = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoder)
autoencoder.compile(optimizer='adam', loss=loss_func) Layer (type) Output Shape Param #
input_3 (InputLayer) [(None, 224, 224, 3)] 0
conv2d_28 (Conv2D) (None, 224, 224, 16) 448
max_pooling2d_12 (MaxPooling (None, 112, 112, 16) 0
conv2d_29 (Conv2D) (None, 112, 112, 8) 1160
max_pooling2d_13 (MaxPooling (None, 56, 56, 8) 0
conv2d_30 (Conv2D) (None, 56, 56, 8) 584
max_pooling2d_14 (MaxPooling (None, 28, 28, 8) 0
conv2d_31 (Conv2D) (None, 28, 28, 8) 584
up_sampling2d_12 (UpSampling (None, 56, 56, 8) 0
conv2d_32 (Conv2D) (None, 56, 56, 8) 584
up_sampling2d_13 (UpSampling (None, 112, 112, 8) 0
conv2d_33 (Conv2D) (None, 112, 112, 16) 1168
up_sampling2d_14 (UpSampling (None, 224, 224, 16) 0
conv2d_34 (Conv2D) (None, 224, 224, 3) 435
Total params: 4,963
Trainable params: 4,963
Non-trainable params: 0
Layer (type) Output Shape Param #
conv2d_21 (Conv2D) (None, 224, 224, 16) 448
max_pooling2d_9 (MaxPooling2 (None, 112, 112, 16) 0
conv2d_22 (Conv2D) (None, 112, 112, 8) 1160
max_pooling2d_10 (MaxPooling (None, 56, 56, 8) 0
conv2d_23 (Conv2D) (None, 56, 56, 8) 584
max_pooling2d_11 (MaxPooling (None, 28, 28, 8) 0
conv2d_24 (Conv2D) (None, 28, 28, 8) 584
up_sampling2d_9 (UpSampling2 (None, 56, 56, 8) 0
conv2d_25 (Conv2D) (None, 56, 56, 8) 584
up_sampling2d_10 (UpSampling (None, 112, 112, 8) 0
conv2d_26 (Conv2D) (None, 112, 112, 16) 1168
up_sampling2d_11 (UpSampling (None, 224, 224, 16) 0
conv2d_27 (Conv2D) (None, 224, 224, 3) 435
Total params: 4,963
Trainable params: 4,963
Non-trainable params: 0
Second model (The model I want to build as first model in different way):
autoencoder = Sequential()
autoencoder.compile(optimizer='adam', loss=loss_func)
Layer (type) Output Shape Param #
input_3 (InputLayer) [(None, 224, 224, 3)] 0
conv2d_28 (Conv2D) (None, 224, 224, 16) 448
max_pooling2d_12 (MaxPooling (None, 112, 112, 16) 0
conv2d_29 (Conv2D) (None, 112, 112, 8) 1160
max_pooling2d_13 (MaxPooling (None, 56, 56, 8) 0
conv2d_30 (Conv2D) (None, 56, 56, 8) 584
max_pooling2d_14 (MaxPooling (None, 28, 28, 8) 0
conv2d_31 (Conv2D) (None, 28, 28, 8) 584
up_sampling2d_12 (UpSampling (None, 56, 56, 8) 0
conv2d_32 (Conv2D) (None, 56, 56, 8) 584
up_sampling2d_13 (UpSampling (None, 112, 112, 8) 0
conv2d_33 (Conv2D) (None, 112, 112, 16) 1168
up_sampling2d_14 (UpSampling (None, 224, 224, 16) 0
conv2d_34 (Conv2D) (None, 224, 224, 3) 435
Total params: 4,963
Trainable params: 4,963
Non-trainable params: 0
You should set a random seed using tensorflow.set_random_seed(0) and numpy.random.seed(0). The seed can be any int or 1D array_like, and should be set in your code once.
Also make sure that you have shuffling disabled, shuffle=False)
After that a random weight/parameters initialization and data ordering will be reproduceable in consecutive experiments and models.
Although there still may be some randomness resulting in different results after running the model. It can be from other libraries that use other randomness modules. (eg.: does not give reproducible results)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
I have following CNN:
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 256, 256, 32) 320
conv2d_1 (Conv2D) (None, 128, 128, 32) 9248
conv2d_2 (Conv2D) (None, 128, 128, 64) 18496
conv2d_3 (Conv2D) (None, 64, 64, 64) 36928
conv2d_4 (Conv2D) (None, 64, 64, 128) 73856
conv2d_5 (Conv2D) (None, 32, 32, 128) 147584
conv2d_6 (Conv2D) (None, 32, 32, 256) 295168
conv2d_7 (Conv2D) (None, 16, 16, 256) 590080
conv2d_8 (Conv2D) (None, 16, 16, 512) 1180160
conv2d_9 (Conv2D) (None, 16, 16, 256) 1179904
conv2d_transpose (Conv2DTran (None, 32, 32, 256) 590080
conv2d_10 (Conv2D) (None, 32, 32, 128) 295040
conv2d_transpose_1 (Conv2DTr (None, 64, 64, 128) 147584
conv2d_11 (Conv2D) (None, 64, 64, 64) 73792
conv2d_transpose_2 (Conv2DTr (None, 128, 128, 64) 36928
conv2d_12 (Conv2D) (None, 128, 128, 32) 18464
conv2d_transpose_3 (Conv2DTr (None, 256, 256, 32) 9248
conv2d_13 (Conv2D) (None, 256, 256, 16) 4624
conv2d_14 (Conv2D) (None, 256, 256, 2) 290
Total params: 4,707,794
Trainable params: 4,707,794
Non-trainable params: 0
And i training it with this params:
Dataset size: 150 000
Optimizer: Adam
Batch size: 128
Loss function: MSE
And i have these graphs:
We can see, accurancy and loss improving whole time. But the problem is with output. When i evaluate each epoch model with my test dataset, best epoch is 15.
Here we can see comparation between epoch 15 and 60.
Is this overfitting and how to prevent it?
Correct - this is an example of overfitting. You can see this starting to occur around epoch 10, where the accuracy of the training set starts to rise higher than that of the validation set.
This is caused by the model beginning to memorise the training set patterns too much, and so it can't generalise well on unseen data (your validation set).
You don't appear to have any regularisation layers in your model, so I'd definitely recommend adding some dropout layers. Dropout works by randomly 'turning-off' nodes, so the model is forced to learn other routes through the network and so helps prevent overfitting. This blog does a good job of explaining.
Start with a dropout of 0.1 and see if the point at which the training accuracy and validation accuracy start to differ begins at a later epoch than epoch 10. So if for example training acc > validation acc now starts at epoch 20, then you know dropout is having a positive effect, and you can decide what to do from there.
As always, make changes in small steps so you can see what's happening.
I have trying plot classification report, but in my problem have a only 2 classes (0 and 1) and when I called the classification report, his output is it:
enter image description here
My model is a LSTM with Glove embedding for sentiment classification, this is an architecture:
Model: "sequential_6"
Layer (type) Output Shape Param #
embedding_6 (Embedding) (None, 55, 300) 68299200
spatial_dropout1d_12 (Spatia (None, 55, 300) 0
lstm_12 (LSTM) (None, 55, 128) 219648
lstm_13 (LSTM) (None, 55, 64) 49408
spatial_dropout1d_13 (Spatia (None, 55, 64) 0
dense_18 (Dense) (None, 55, 512) 33280
dropout_6 (Dropout) (None, 55, 512) 0
dense_19 (Dense) (None, 55, 64) 32832
dense_20 (Dense) (None, 55, 1) 65
Total params: 68,634,433
Trainable params: 335,233
Non-trainable params: 68,299,200
You can define your output from the classification_report to be a dict(), so that you can then read it as a pandas DataFrame via pandas.DataFrame.from_dict() like this:
import pandas as pd
display(pd.DataFrame.from_dict(classification_report(y_true, y_pred, output_dict=True)).T)
I'm building a custom model (SegNet) in Tensorflow 2.1.0.
The first problem I'm facing is the reutilization of the indices of the max pooling operation needed as described in the paper.
Basically, since it is an encoder-decoder architecture, the pooling indices, of the encoding section of the network, are needed in the decoding to upsample the feature maps and keep the values targeted by the corresponding indices.
Now, in TF these indices are not exported by default by the layer tf.keras.layers.MaxPool2D (as for example are in PyTorch).
To get the indices of the max pooling operation it is required to use tf.nn.max_pool_with_argmax.
This operation, anyway, returns the indices (argmax) in a flattened format, which requires further operations to be useful in other parts of the network.
To implement a layer that performs a MaxPooling2D and exports these indices (flattened) I defined a custom layer in keras.
class MaxPoolingWithArgmax2D(Layer):
def __init__(
pool_size=(2, 2),
super(MaxPoolingWithArgmax2D, self).__init__(**kwargs)
self.padding = padding
self.pool_size = pool_size
self.strides = strides
def call(self, inputs, **kwargs):
padding = self.padding
pool_size = self.pool_size
strides = self.strides
output, argmax = tf.nn.max_pool_with_argmax(
return output, argmax
Obviously, this layer is used in the encoding section of the network, hence a decoding respective layer is needed to perform the inverse operation (UpSampling2D), with the utilization of the indices (further details of this operation in the paper).
After some research, I found legacy code (TF<2.1.0) and adapted it to perform the operation.
Anyway I'm not 100% convinced this code works well, in fact there are some things I don't like.
class MaxUnpooling2D(Layer):
def __init__(self, size=(2, 2), **kwargs):
super(MaxUnpooling2D, self).__init__(**kwargs)
self.size = size
def call(self, inputs, output_shape=None):
updates, mask = inputs[0], inputs[1]
with tf.name_scope(
mask = tf.cast(mask, 'int32')
#input_shape = tf.shape(updates, out_type='int32')
input_shape = updates.get_shape()
# This statement is required if I don't want to specify a batch size
if input_shape[0] == None:
batches = 1
batches = input_shape[0]
# calculation new shape
if output_shape is None:
output_shape = (
# calculation indices for batch, height, width and feature maps
one_like_mask = tf.ones_like(mask, dtype='int32')
batch_shape = tf.concat(
[[batches], [1], [1], [1]],
batch_range = tf.reshape(
tf.range(output_shape[0], dtype='int32'),
b = one_like_mask * batch_range
y = mask // (output_shape[2] * output_shape[3])
x = (mask // output_shape[3]) % output_shape[2]
feature_range = tf.range(output_shape[3], dtype='int32')
f = one_like_mask * feature_range
# transpose indices & reshape update values to one dimension
updates_size = tf.size(updates)
indices = tf.transpose(tf.reshape(
tf.stack([b, y, x, f]),
[4, updates_size]))
values = tf.reshape(updates, [updates_size])
ret = tf.scatter_nd(indices, values, output_shape)
return ret
The things that bother me are:
Performing the operation to unflatten the indices (MaxUnpooling2D) is strictly related to knowing a specific batch size, which for model validation I would like to be None or unspecified.
I am not sure this code is actually 100% compatible with the rest of the library. In fact during fit if I use tf.keras.metrics.MeanIoU the value converges to 0.341 and keeps constant for every other epoch than the first. Instead the standard accuracy metric works just fine.
Network architecture in Depth
Following, the complete definition of the model.
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
from tensorflow.keras.layers import Layer
class SegNet:
def __init__(self, data_shape, classes = 3, batch_size = None):
self.MODEL_NAME = 'SegNet'
self.MODEL_VERSION = '0.2'
self.classes = classes
self.batch_size = batch_size
def build_model(self, data_shape):
input_shape = (data_shape, data_shape, 3)
inputs = keras.Input(shape=input_shape, batch_size=self.batch_size, name='Input')
# Build sequential model
# Encoding
encoders = 5
feature_maps = [64, 128, 256, 512, 512]
n_convolutions = [2, 2, 3, 3, 3]
eb_input = inputs
eb_argmax_indices = []
for encoder_index in range(encoders):
encoder_block, argmax_indices = self.encoder_block(
eb_input, encoder_index, feature_maps[encoder_index], n_convolutions[encoder_index])
eb_input = encoder_block
# Decoding
decoders = encoders
db_input = encoder_block
d_feature_maps = [512, 512, 256, 128, 64]
d_n_convolutions = n_convolutions
for decoder_index in range(decoders):
decoder_block = self.decoder_block(
db_input, eb_argmax_indices[decoder_index], decoder_index, d_feature_maps[decoder_index], d_n_convolutions[decoder_index])
db_input = decoder_block
output = layers.Softmax()(decoder_block)
self.model = keras.Model(inputs=inputs, outputs=output, name="SegNet")
def encoder_block(self, x, encoder_index, feature_maps, n_convolutions):
bank_input = x
for conv_index in range(n_convolutions):
bank = self.eb_layers_bank(
bank_input, conv_index, feature_maps, encoder_index)
bank_input = bank
max_pool, indices = MaxPoolingWithArgmax2D(pool_size=(
2, 2), strides=2, padding='same', name='EB_{}_MPOOL'.format(encoder_index + 1))(bank)
return max_pool, indices
def eb_layers_bank(self, x, bank_index, feature_maps, encoder_index):
bank_input = x
conv_l = layers.Conv2D(feature_maps, (3, 3), padding='same', name='EB_{}_BANK_{}_CONV'.format(
encoder_index + 1, bank_index + 1))(bank_input)
batch_norm = layers.BatchNormalization(
name='EB_{}_BANK_{}_BN'.format(encoder_index + 1, bank_index + 1))(conv_l)
relu = layers.ReLU(name='EB_{}_BANK_{}_RL'.format(
encoder_index + 1, bank_index + 1))(batch_norm)
return relu
def decoder_block(self, x, max_pooling_idices, decoder_index, feature_maps, n_convolutions):
#bank_input = self.unpool_with_argmax(x, max_pooling_idices)
bank_input = MaxUnpooling2D(name='DB_{}_UPSAMP'.format(decoder_index + 1))([x, max_pooling_idices])
#bank_input = layers.UpSampling2D()(x)
for conv_index in range(n_convolutions):
if conv_index == n_convolutions - 1:
last_l_banck = True
last_l_banck = False
bank = self.db_layers_bank(
bank_input, conv_index, feature_maps, decoder_index, last_l_banck)
bank_input = bank
return bank
def db_layers_bank(self, x, bank_index, feature_maps, decoder_index, last_l_bank):
bank_input = x
if (last_l_bank) & (decoder_index == 4):
conv_l = layers.Conv2D(self.classes, (1, 1), padding='same', name='DB_{}_BANK_{}_CONV'.format(
decoder_index + 1, bank_index + 1))(bank_input)
#batch_norm = layers.BatchNormalization(
# name='DB_{}_BANK_{}_BN'.format(decoder_index + 1, bank_index + 1))(conv_l)
return conv_l
if (last_l_bank) & (decoder_index > 0):
conv_l = layers.Conv2D(int(feature_maps / 2), (3, 3), padding='same', name='DB_{}_BANK_{}_CONV'.format(
decoder_index + 1, bank_index + 1))(bank_input)
conv_l = layers.Conv2D(feature_maps, (3, 3), padding='same', name='DB_{}_BANK_{}_CONV'.format(
decoder_index + 1, bank_index + 1))(bank_input)
batch_norm = layers.BatchNormalization(
name='DB_{}_BANK_{}_BN'.format(decoder_index + 1, bank_index + 1))(conv_l)
relu = layers.ReLU(name='DB_{}_BANK_{}_RL'.format(
decoder_index + 1, bank_index + 1))(batch_norm)
return relu
def get_model(self):
return self.model
Here the output of model.summary().
Model: "SegNet"
Layer (type) Output Shape Param # Connected to
Input (InputLayer) [(None, 416, 416, 3) 0
EB_1_BANK_1_CONV (Conv2D) (None, 416, 416, 64) 1792 Input[0][0]
EB_1_BANK_1_BN (BatchNormalizat (None, 416, 416, 64) 256 EB_1_BANK_1_CONV[0][0]
EB_1_BANK_1_RL (ReLU) (None, 416, 416, 64) 0 EB_1_BANK_1_BN[0][0]
EB_1_BANK_2_CONV (Conv2D) (None, 416, 416, 64) 36928 EB_1_BANK_1_RL[0][0]
EB_1_BANK_2_BN (BatchNormalizat (None, 416, 416, 64) 256 EB_1_BANK_2_CONV[0][0]
EB_1_BANK_2_RL (ReLU) (None, 416, 416, 64) 0 EB_1_BANK_2_BN[0][0]
EB_1_MPOOL (MaxPoolingWithArgma ((None, 208, 208, 64 0 EB_1_BANK_2_RL[0][0]
EB_2_BANK_1_CONV (Conv2D) (None, 208, 208, 128 73856 EB_1_MPOOL[0][0]
EB_2_BANK_1_BN (BatchNormalizat (None, 208, 208, 128 512 EB_2_BANK_1_CONV[0][0]
EB_2_BANK_1_RL (ReLU) (None, 208, 208, 128 0 EB_2_BANK_1_BN[0][0]
EB_2_BANK_2_CONV (Conv2D) (None, 208, 208, 128 147584 EB_2_BANK_1_RL[0][0]
EB_2_BANK_2_BN (BatchNormalizat (None, 208, 208, 128 512 EB_2_BANK_2_CONV[0][0]
EB_2_BANK_2_RL (ReLU) (None, 208, 208, 128 0 EB_2_BANK_2_BN[0][0]
EB_2_MPOOL (MaxPoolingWithArgma ((None, 104, 104, 12 0 EB_2_BANK_2_RL[0][0]
EB_3_BANK_1_CONV (Conv2D) (None, 104, 104, 256 295168 EB_2_MPOOL[0][0]
EB_3_BANK_1_BN (BatchNormalizat (None, 104, 104, 256 1024 EB_3_BANK_1_CONV[0][0]
EB_3_BANK_1_RL (ReLU) (None, 104, 104, 256 0 EB_3_BANK_1_BN[0][0]
EB_3_BANK_2_CONV (Conv2D) (None, 104, 104, 256 590080 EB_3_BANK_1_RL[0][0]
EB_3_BANK_2_BN (BatchNormalizat (None, 104, 104, 256 1024 EB_3_BANK_2_CONV[0][0]
EB_3_BANK_2_RL (ReLU) (None, 104, 104, 256 0 EB_3_BANK_2_BN[0][0]
EB_3_BANK_3_CONV (Conv2D) (None, 104, 104, 256 590080 EB_3_BANK_2_RL[0][0]
EB_3_BANK_3_BN (BatchNormalizat (None, 104, 104, 256 1024 EB_3_BANK_3_CONV[0][0]
EB_3_BANK_3_RL (ReLU) (None, 104, 104, 256 0 EB_3_BANK_3_BN[0][0]
EB_3_MPOOL (MaxPoolingWithArgma ((None, 52, 52, 256) 0 EB_3_BANK_3_RL[0][0]
EB_4_BANK_1_CONV (Conv2D) (None, 52, 52, 512) 1180160 EB_3_MPOOL[0][0]
EB_4_BANK_1_BN (BatchNormalizat (None, 52, 52, 512) 2048 EB_4_BANK_1_CONV[0][0]
EB_4_BANK_1_RL (ReLU) (None, 52, 52, 512) 0 EB_4_BANK_1_BN[0][0]
EB_4_BANK_2_CONV (Conv2D) (None, 52, 52, 512) 2359808 EB_4_BANK_1_RL[0][0]
EB_4_BANK_2_BN (BatchNormalizat (None, 52, 52, 512) 2048 EB_4_BANK_2_CONV[0][0]
EB_4_BANK_2_RL (ReLU) (None, 52, 52, 512) 0 EB_4_BANK_2_BN[0][0]
EB_4_BANK_3_CONV (Conv2D) (None, 52, 52, 512) 2359808 EB_4_BANK_2_RL[0][0]
EB_4_BANK_3_BN (BatchNormalizat (None, 52, 52, 512) 2048 EB_4_BANK_3_CONV[0][0]
EB_4_BANK_3_RL (ReLU) (None, 52, 52, 512) 0 EB_4_BANK_3_BN[0][0]
EB_4_MPOOL (MaxPoolingWithArgma ((None, 26, 26, 512) 0 EB_4_BANK_3_RL[0][0]
EB_5_BANK_1_CONV (Conv2D) (None, 26, 26, 512) 2359808 EB_4_MPOOL[0][0]
EB_5_BANK_1_BN (BatchNormalizat (None, 26, 26, 512) 2048 EB_5_BANK_1_CONV[0][0]
EB_5_BANK_1_RL (ReLU) (None, 26, 26, 512) 0 EB_5_BANK_1_BN[0][0]
EB_5_BANK_2_CONV (Conv2D) (None, 26, 26, 512) 2359808 EB_5_BANK_1_RL[0][0]
EB_5_BANK_2_BN (BatchNormalizat (None, 26, 26, 512) 2048 EB_5_BANK_2_CONV[0][0]
EB_5_BANK_2_RL (ReLU) (None, 26, 26, 512) 0 EB_5_BANK_2_BN[0][0]
EB_5_BANK_3_CONV (Conv2D) (None, 26, 26, 512) 2359808 EB_5_BANK_2_RL[0][0]
EB_5_BANK_3_BN (BatchNormalizat (None, 26, 26, 512) 2048 EB_5_BANK_3_CONV[0][0]
EB_5_BANK_3_RL (ReLU) (None, 26, 26, 512) 0 EB_5_BANK_3_BN[0][0]
EB_5_MPOOL (MaxPoolingWithArgma ((None, 13, 13, 512) 0 EB_5_BANK_3_RL[0][0]
DB_1_UPSAMP (MaxUnpooling2D) (1, 26, 26, 512) 0 EB_5_MPOOL[0][0]
DB_1_BANK_1_CONV (Conv2D) (1, 26, 26, 512) 2359808 DB_1_UPSAMP[0][0]
DB_1_BANK_1_BN (BatchNormalizat (1, 26, 26, 512) 2048 DB_1_BANK_1_CONV[0][0]
DB_1_BANK_1_RL (ReLU) (1, 26, 26, 512) 0 DB_1_BANK_1_BN[0][0]
DB_1_BANK_2_CONV (Conv2D) (1, 26, 26, 512) 2359808 DB_1_BANK_1_RL[0][0]
DB_1_BANK_2_BN (BatchNormalizat (1, 26, 26, 512) 2048 DB_1_BANK_2_CONV[0][0]
DB_1_BANK_2_RL (ReLU) (1, 26, 26, 512) 0 DB_1_BANK_2_BN[0][0]
DB_1_BANK_3_CONV (Conv2D) (1, 26, 26, 512) 2359808 DB_1_BANK_2_RL[0][0]
DB_1_BANK_3_BN (BatchNormalizat (1, 26, 26, 512) 2048 DB_1_BANK_3_CONV[0][0]
DB_1_BANK_3_RL (ReLU) (1, 26, 26, 512) 0 DB_1_BANK_3_BN[0][0]
DB_2_UPSAMP (MaxUnpooling2D) (1, 52, 52, 512) 0 DB_1_BANK_3_RL[0][0]
DB_2_BANK_1_CONV (Conv2D) (1, 52, 52, 512) 2359808 DB_2_UPSAMP[0][0]
DB_2_BANK_1_BN (BatchNormalizat (1, 52, 52, 512) 2048 DB_2_BANK_1_CONV[0][0]
DB_2_BANK_1_RL (ReLU) (1, 52, 52, 512) 0 DB_2_BANK_1_BN[0][0]
DB_2_BANK_2_CONV (Conv2D) (1, 52, 52, 512) 2359808 DB_2_BANK_1_RL[0][0]
DB_2_BANK_2_BN (BatchNormalizat (1, 52, 52, 512) 2048 DB_2_BANK_2_CONV[0][0]
DB_2_BANK_2_RL (ReLU) (1, 52, 52, 512) 0 DB_2_BANK_2_BN[0][0]
DB_2_BANK_3_CONV (Conv2D) (1, 52, 52, 256) 1179904 DB_2_BANK_2_RL[0][0]
DB_2_BANK_3_BN (BatchNormalizat (1, 52, 52, 256) 1024 DB_2_BANK_3_CONV[0][0]
DB_2_BANK_3_RL (ReLU) (1, 52, 52, 256) 0 DB_2_BANK_3_BN[0][0]
DB_3_UPSAMP (MaxUnpooling2D) (1, 104, 104, 256) 0 DB_2_BANK_3_RL[0][0]
DB_3_BANK_1_CONV (Conv2D) (1, 104, 104, 256) 590080 DB_3_UPSAMP[0][0]
DB_3_BANK_1_BN (BatchNormalizat (1, 104, 104, 256) 1024 DB_3_BANK_1_CONV[0][0]
DB_3_BANK_1_RL (ReLU) (1, 104, 104, 256) 0 DB_3_BANK_1_BN[0][0]
DB_3_BANK_2_CONV (Conv2D) (1, 104, 104, 256) 590080 DB_3_BANK_1_RL[0][0]
DB_3_BANK_2_BN (BatchNormalizat (1, 104, 104, 256) 1024 DB_3_BANK_2_CONV[0][0]
DB_3_BANK_2_RL (ReLU) (1, 104, 104, 256) 0 DB_3_BANK_2_BN[0][0]
DB_3_BANK_3_CONV (Conv2D) (1, 104, 104, 128) 295040 DB_3_BANK_2_RL[0][0]
DB_3_BANK_3_BN (BatchNormalizat (1, 104, 104, 128) 512 DB_3_BANK_3_CONV[0][0]
DB_3_BANK_3_RL (ReLU) (1, 104, 104, 128) 0 DB_3_BANK_3_BN[0][0]
DB_4_UPSAMP (MaxUnpooling2D) (1, 208, 208, 128) 0 DB_3_BANK_3_RL[0][0]
DB_4_BANK_1_CONV (Conv2D) (1, 208, 208, 128) 147584 DB_4_UPSAMP[0][0]
DB_4_BANK_1_BN (BatchNormalizat (1, 208, 208, 128) 512 DB_4_BANK_1_CONV[0][0]
DB_4_BANK_1_RL (ReLU) (1, 208, 208, 128) 0 DB_4_BANK_1_BN[0][0]
DB_4_BANK_2_CONV (Conv2D) (1, 208, 208, 64) 73792 DB_4_BANK_1_RL[0][0]
DB_4_BANK_2_BN (BatchNormalizat (1, 208, 208, 64) 256 DB_4_BANK_2_CONV[0][0]
DB_4_BANK_2_RL (ReLU) (1, 208, 208, 64) 0 DB_4_BANK_2_BN[0][0]
DB_5_UPSAMP (MaxUnpooling2D) (1, 416, 416, 64) 0 DB_4_BANK_2_RL[0][0]
DB_5_BANK_1_CONV (Conv2D) (1, 416, 416, 64) 36928 DB_5_UPSAMP[0][0]
DB_5_BANK_1_BN (BatchNormalizat (1, 416, 416, 64) 256 DB_5_BANK_1_CONV[0][0]
DB_5_BANK_1_RL (ReLU) (1, 416, 416, 64) 0 DB_5_BANK_1_BN[0][0]
DB_5_BANK_2_CONV (Conv2D) (1, 416, 416, 3) 195 DB_5_BANK_1_RL[0][0]
softmax (Softmax) (1, 416, 416, 3) 0 DB_5_BANK_2_CONV[0][0]
Total params: 29,459,075
Trainable params: 29,443,203
Non-trainable params: 15,872
As you can see, I'm forced to specify a batch size in the MaxUnpooling2D otherwise I get errors that the operation can not be performed since there are None values and shapes can not be correctly transformed.
When I try to predict an image, I'm forced to specify the correct batch dimension, otherwise I get errors like:
InvalidArgumentError: Shapes of all inputs must match: values[0].shape = [4,208,208,64] != values[1].shape = [1,208,208,64]
[[{{node SegNet/DB_5_UPSAMP/PartitionedCall/PartitionedCall/DB_5_UPSAMP/stack}}]] [Op:__inference_predict_function_70839]
Which is caused by the implementation required to unravel the indices from the max pooling operation.
Training graphs
Here is a reference with a training on 20 epochs.
As you can see the MeanIoU metric is linear, no progress, no updates other than in epoch 1.
The other metric works fine, and loss decrease correctly.
There is a better way, more compatible with recent versions of TF, to implement the unraveling and upsampling with indices from the max pooling operation?
If the implementation is correct, why I get a metric stuck at a specific value? Am I doing something wrong in the model?
Thank you!
You can have reshapes with unknown batch size in custom layers in two ways.
If you know the rest of the shape, reshape using -1 as the batch size:
Suppose you know the size of your expected array:
import tensorflow.keras.backend as K
reshaped = K.reshape(original, (-1, x, y, channels))
Suppose you don't know the size, then use K.shape to get the shape as a tensor:
inputs_shape = K.shape(inputs)
batch_size = inputs_shape[:1]
x = inputs_shape[1:2]
y = inputs_shape[2:3]
ch = inputs_shape[3:]
#you can then concatenate these and operate them (notice I kept them as 1D vector, not as scalar)
newShape = K.concatenate([batch_size, x, y, ch]) #of course you will make your operations
Once I did my own version of a Segnet, I didn't use indices, but kept a one hot version. It's true that it takes extra operations, but it might work well:
def get_indices(original, unpooled):
is_equal = K.equal(original, unpooled)
return K.cast(is_equal, K.floatx())
previous_output = ...
pooled = MaxPooling2D()(previous_output)
unpooled = UpSampling2D()(pooled)
one_hot_indices = Lambda(get_indices)([previous_output, unpooled])
Then after an upsampling, I concatenate these indices and pass a new conv:
some_output = ...
upsampled = UpSampling2D()(some_output)
with_indices = Concatenate([upsampled, one_hot_indices])
upsampled = Conv2D(...)(with_indices)
I am trying to train a pretrained keras model on new data. I came across tensorflow's dataset api and I am trying to use it with my old keras model. I understand that tf data api returns tensors, so the data api as well as model should be part of the same graph and the output of the data api should be connected as input to the model. Here is the code
import tensorflow as tf
from data_pipeline import ImageDataGenerator
import os
import keras
from keras.engine import InputLayer
###################### to check visible devices ###############
from tensorflow.python.client import device_lib
_EPOCHS = 10
def training_pipeline():
# #############
# Load Dataset
# #############
training_set = ImageDataGenerator(directory="\\\\in-pdc-sem2\\training",
horizontal_flip=True, vertical_flip=True, rescale=True, normalize=True,
color_jitter=True, batch_size=_BATCH_SIZE,
num_cpus=8, epochs=60, output_patch_size=389, validation=False).dataset_pipeline()
testing_set = ImageDataGenerator(directory="\\\\in-pdc-sem2\\training",
horizontal_flip=False, vertical_flip=False, rescale=False, normalize=True,
color_jitter=False, batch_size=_BATCH_SIZE,
num_cpus=8, epochs=60, output_patch_size=389, validation=True).dataset_pipeline()
print(training_set.output_types, training_set.output_shapes)
iterator =, training_set.output_shapes)#((None, 389, 389, 3), (None)))
train_initializer = iterator.make_initializer(training_set)
validation_initializer = iterator.make_initializer(testing_set)
img, labels = iterator.get_next()
img = img.set_shape((None, 389, 389, 3))
model = baseline_model(img, labels) # keras model defined here
for epoch in range(_EPOCHS):
# #############
# Train Model
# #############
steps_per_epoch=1000000 // _BATCH_SIZE,
# validation_steps=11970 // _BATCH_SIZE,
verbose = 1)
loss, acc, cross_entropy = model.evaluate(verbose=1, steps=11970 // 32)
filepath = "./weights/ResNet_16_Best/weights-improvement-Run1-" + str(epoch) + "-" + str(loss) + ".hdf5"
model.save_weights(filepath, overwrite=True)
def baseline_model(input_tensor, labels):
jsonFile = '\\\\in-pdc-sem2\\resnetV4_2Best.json'
weightsFile = '\\\\in-pdc-sem1\\resnetV4_2BestWeightsOnly.hdf5'
with open(jsonFile, "r") as file:
jsonDef =
from keras.models import model_from_json
model_single = model_from_json(jsonDef)
model_single.layers[0] = InputLayer(input_tensor=input_tensor, input_shape=(389, 389, 3))
model_single.compile(target_tensors=[labels], loss='categorical_crossentropy', optimizer='Adam', metrics=[keras.metrics.categorical_accuracy])
return model_single
def callbacks():
tensorboard = keras.callbacks.TensorBoard(log_dir='./tensorboard', write_grads=False, write_images=False, histogram_freq=0)
callbacks_list = [tensorboard]
return callbacks_list
if __name__ == '__main__':
The "training set" returns image and label tuple, image is a tensor of shape (32, 389, 389, 3), its a batch of 32 images. I verified the shape in a separate script, it is correct. I am defining the input layer of the model using the tensor, and target tensors in the model.compile part.
This is what the model.summary output looks like:
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 389, 389, 3) 0
conv1 (Conv2D) (None, 383, 383, 13) 1924 input_1[0][0]
bn_conv1 (BatchNormalization) (None, 383, 383, 13) 52 conv1[0][0]
activation_1 (Activation) (None, 383, 383, 13) 0 bn_conv1[0][0]
max_pooling2d_1 (MaxPooling2D) (None, 191, 191, 13) 0 activation_1[0][0]
res2a_branch2a (Conv2D) (None, 191, 191, 4) 56 max_pooling2d_1[0][0]
bn2a_branch2a (BatchNormalizati (None, 191, 191, 4) 16 res2a_branch2a[0][0]
activation_2 (Activation) (None, 191, 191, 4) 0 bn2a_branch2a[0][0]
res2a_branch2b (Conv2D) (None, 191, 191, 4) 148 activation_2[0][0]
bn2a_branch2b (BatchNormalizati (None, 191, 191, 4) 16 res2a_branch2b[0][0]
activation_3 (Activation) (None, 191, 191, 4) 0 bn2a_branch2b[0][0]
res2a_branch2c (Conv2D) (None, 191, 191, 8) 40 activation_3[0][0]
res2a_branch1 (Conv2D) (None, 191, 191, 8) 112 max_pooling2d_1[0][0]
bn2a_branch2c (BatchNormalizati (None, 191, 191, 8) 32 res2a_branch2c[0][0]
bn2a_branch1 (BatchNormalizatio (None, 191, 191, 8) 32 res2a_branch1[0][0]
add_1 (Add) (None, 191, 191, 8) 0 bn2a_branch2c[0][0]
activation_4 (Activation) (None, 191, 191, 8) 0 add_1[0][0]
bn2b_branch2a (BatchNormalizati (None, 191, 191, 8) 32 activation_4[0][0]
activation_5 (Activation) (None, 191, 191, 8) 0 bn2b_branch2a[0][0]
res2b_branch2b (Conv2D) (None, 191, 191, 4) 292 activation_5[0][0]
bn2b_branch2b (BatchNormalizati (None, 191, 191, 4) 16 res2b_branch2b[0][0]
activation_6 (Activation) (None, 191, 191, 4) 0 bn2b_branch2b[0][0]
res2b_branch2c (Conv2D) (None, 191, 191, 8) 40 activation_6[0][0]
add_2 (Add) (None, 191, 191, 8) 0 res2b_branch2c[0][0]
bn2c_branch2a (BatchNormalizati (None, 191, 191, 8) 32 add_2[0][0]
activation_7 (Activation) (None, 191, 191, 8) 0 bn2c_branch2a[0][0]
res2c_branch2b (Conv2D) (None, 191, 191, 4) 292 activation_7[0][0]
bn2c_branch2b (BatchNormalizati (None, 191, 191, 4) 16 res2c_branch2b[0][0]
activation_8 (Activation) (None, 191, 191, 4) 0 bn2c_branch2b[0][0]
res2c_branch2c (Conv2D) (None, 191, 191, 8) 40 activation_8[0][0]
add_3 (Add) (None, 191, 191, 8) 0 res2c_branch2c[0][0]
res3a_branch2a (Conv2D) (None, 96, 96, 8) 72 add_3[0][0]
bn3a_branch2a (BatchNormalizati (None, 96, 96, 8) 32 res3a_branch2a[0][0]
activation_9 (Activation) (None, 96, 96, 8) 0 bn3a_branch2a[0][0]
res3a_branch2b (Conv2D) (None, 96, 96, 8) 584 activation_9[0][0]
bn3a_branch2b (BatchNormalizati (None, 96, 96, 8) 32 res3a_branch2b[0][0]
activation_10 (Activation) (None, 96, 96, 8) 0 bn3a_branch2b[0][0]
res3a_branch2c (Conv2D) (None, 96, 96, 16) 144 activation_10[0][0]
res3a_branch1 (Conv2D) (None, 96, 96, 16) 144 add_3[0][0]
bn3a_branch2c (BatchNormalizati (None, 96, 96, 16) 64 res3a_branch2c[0][0]
bn3a_branch1 (BatchNormalizatio (None, 96, 96, 16) 64 res3a_branch1[0][0]
add_4 (Add) (None, 96, 96, 16) 0 bn3a_branch2c[0][0]
activation_11 (Activation) (None, 96, 96, 16) 0 add_4[0][0]
bn3b_branch2a (BatchNormalizati (None, 96, 96, 16) 64 activation_11[0][0]
activation_12 (Activation) (None, 96, 96, 16) 0 bn3b_branch2a[0][0]
res3b_branch2b (Conv2D) (None, 96, 96, 8) 1160 activation_12[0][0]
bn3b_branch2b (BatchNormalizati (None, 96, 96, 8) 32 res3b_branch2b[0][0]
activation_13 (Activation) (None, 96, 96, 8) 0 bn3b_branch2b[0][0]
res3b_branch2c (Conv2D) (None, 96, 96, 16) 144 activation_13[0][0]
add_5 (Add) (None, 96, 96, 16) 0 res3b_branch2c[0][0]
res4a_branch2a (Conv2D) (None, 48, 48, 16) 272 add_5[0][0]
bn4a_branch2a (BatchNormalizati (None, 48, 48, 16) 64 res4a_branch2a[0][0]
activation_14 (Activation) (None, 48, 48, 16) 0 bn4a_branch2a[0][0]
res4a_branch2b (Conv2D) (None, 48, 48, 16) 2320 activation_14[0][0]
bn4a_branch2b (BatchNormalizati (None, 48, 48, 16) 64 res4a_branch2b[0][0]
activation_15 (Activation) (None, 48, 48, 16) 0 bn4a_branch2b[0][0]
res4a_branch2c (Conv2D) (None, 48, 48, 64) 1088 activation_15[0][0]
res4a_branch1 (Conv2D) (None, 48, 48, 64) 1088 add_5[0][0]
bn4a_branch2c (BatchNormalizati (None, 48, 48, 64) 256 res4a_branch2c[0][0]
bn4a_branch1 (BatchNormalizatio (None, 48, 48, 64) 256 res4a_branch1[0][0]
add_6 (Add) (None, 48, 48, 64) 0 bn4a_branch2c[0][0]
activation_16 (Activation) (None, 48, 48, 64) 0 add_6[0][0]
bn4b_branch2a (BatchNormalizati (None, 48, 48, 64) 256 activation_16[0][0]
activation_17 (Activation) (None, 48, 48, 64) 0 bn4b_branch2a[0][0]
res4b_branch2b (Conv2D) (None, 48, 48, 16) 9232 activation_17[0][0]
bn4b_branch2b (BatchNormalizati (None, 48, 48, 16) 64 res4b_branch2b[0][0]
activation_18 (Activation) (None, 48, 48, 16) 0 bn4b_branch2b[0][0]
res4b_branch2c (Conv2D) (None, 48, 48, 64) 1088 activation_18[0][0]
add_7 (Add) (None, 48, 48, 64) 0 res4b_branch2c[0][0]
res5a_branch2a (Conv2D) (None, 24, 24, 32) 2080 add_7[0][0]
bn5a_branch2a (BatchNormalizati (None, 24, 24, 32) 128 res5a_branch2a[0][0]
activation_19 (Activation) (None, 24, 24, 32) 0 bn5a_branch2a[0][0]
res5a_branch2b (Conv2D) (None, 24, 24, 32) 9248 activation_19[0][0]
bn5a_branch2b (BatchNormalizati (None, 24, 24, 32) 128 res5a_branch2b[0][0]
activation_20 (Activation) (None, 24, 24, 32) 0 bn5a_branch2b[0][0]
res5a_branch2c (Conv2D) (None, 24, 24, 128) 4224 activation_20[0][0]
res5a_branch1 (Conv2D) (None, 24, 24, 128) 8320 add_7[0][0]
bn5a_branch2c (BatchNormalizati (None, 24, 24, 128) 512 res5a_branch2c[0][0]
bn5a_branch1 (BatchNormalizatio (None, 24, 24, 128) 512 res5a_branch1[0][0]
add_8 (Add) (None, 24, 24, 128) 0 bn5a_branch2c[0][0]
activation_21 (Activation) (None, 24, 24, 128) 0 add_8[0][0]
res6a_branch2a (Conv2D) (None, 12, 12, 64) 8256 activation_21[0][0]
bn6a_branch2a (BatchNormalizati (None, 12, 12, 64) 256 res6a_branch2a[0][0]
activation_22 (Activation) (None, 12, 12, 64) 0 bn6a_branch2a[0][0]
res6a_branch2b (Conv2D) (None, 12, 12, 64) 36928 activation_22[0][0]
bn6a_branch2b (BatchNormalizati (None, 12, 12, 64) 256 res6a_branch2b[0][0]
activation_23 (Activation) (None, 12, 12, 64) 0 bn6a_branch2b[0][0]
res6a_branch2c (Conv2D) (None, 12, 12, 512) 33280 activation_23[0][0]
res6a_branch1 (Conv2D) (None, 12, 12, 512) 66048 activation_21[0][0]
bn6a_branch2c (BatchNormalizati (None, 12, 12, 512) 2048 res6a_branch2c[0][0]
bn6a_branch1 (BatchNormalizatio (None, 12, 12, 512) 2048 res6a_branch1[0][0]
add_9 (Add) (None, 12, 12, 512) 0 bn6a_branch2c[0][0]
activation_24 (Activation) (None, 12, 12, 512) 0 add_9[0][0]
avg_pool (GlobalAveragePooling2 (None, 512) 0 activation_24[0][0]
dropout_1 (Dropout) (None, 512) 0 avg_pool[0][0]
FC1 (Dense) (None, 1) 513 dropout_1[0][0]
activation_25 (Activation) (None, 1) 0 FC1[0][0]
Total params: 196,557
Trainable params: 192,867
Non-trainable params: 3,690
Everything looks correct. However When I run the code, I get the following error:
Epoch 1/1
Traceback (most recent call last):
File "C:/Users/ASista162282/Desktop/code/camleyon_17/", line 114, in <module>
File "C:/Users/ASista162282/Desktop/code/camleyon_17/", line 71, in training_pipeline
verbose = 1)
File "C:\ProgramData\Miniconda3\lib\site-packages\keras\engine\", line 1705, in fit
File "C:\ProgramData\Miniconda3\lib\site-packages\keras\engine\", line 1188, in _fit_loop
outs = f(ins)
File "C:\ProgramData\Miniconda3\lib\site-packages\keras\backend\", line 2478, in __call__
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\client\", line 900, in run
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\client\", line 1111, in _run
ValueError: Cannot feed value of shape () for Tensor 'input_1:0', which has shape '(?, 389, 389, 3)'
It doesn't make any sense. I even added the set_shape function before defining the model, and it still shows empty shape. Any help will be really appreciated. Thank you.
The way you are replacing the input layer doesn't seem to connect the new layer correctly. Try replacing this:
model_single.layers[0] = InputLayer(input_tensor=input_tensor, input_shape=(389, 389, 3))
with this:
from keras.models import Model
new_input = InputLayer(input_tensor=input_tensor, input_shape=(389, 389, 3))
new_output = model_single(new_input)
model_single = Model(new_input, new_output)