How do you use tensorflow ctc_batch_cost function with keras? - tensorflow

I have been trying to implement a CTC loss function in keras for several days now.
Unfortunately, I have yet to find a simple way to do this that fits well with keras. I found tensorflow's tf.keras.backend.ctc_batch_cost function but there is not much documentation on it. I am confused about a few things. First, what are the input_length and label_length parameters? I am trying to make a handwriting recognition model and my images are 32x128, my RNN has 32 time steps, and my character list has a length of 80. I have tried to use 32 for both parameters and this gives me the error below.
Shouldn't the function already know the input_length and label_length from the shape of the first two parameters (y_true and y_pred)?
Secondly, do I need to encode my training data? Is this all done automatically?
I know tensorflow also has a function called tf.keras.backend.ctc_decode. Is this only used when making predictions?
def ctc_cost(y_true, y_pred):
return tf.keras.backend.ctc_batch_cost(
y_true, y_pred, 32, 32)
model = tf.keras.Sequential([
layers.Conv2D(32, 5, padding="SAME", input_shape=(32, 128, 1)),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D(2, 2),
layers.Conv2D(64, 5, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D(2, 2),
layers.Conv2D(128, 3, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D((1, 2), (1, 2)),
layers.Conv2D(128, 3, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D((1, 2), (1, 2)),
layers.Conv2D(256, 3, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D((1, 2), (1, 2)),
layers.Reshape((32, 256)),
layers.Bidirectional(layers.LSTM(256, return_sequences=True)),
layers.Bidirectional(layers.LSTM(256, return_sequences=True)),
layers.Reshape((-1, 32, 512)),
layers.Conv2D(80, 1, padding="SAME"),
layers.Softmax(-1)
])
print(model.summary())
model.compile(tf.optimizers.RMSprop(0.001), ctc_cost)
Error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: squeeze_dims[0] not in [0,0). for 'loss/softmax_loss/Squeeze' (op: 'Squeeze') with input shapes: []
Model:
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 128, 32) 832
batch_normalization (BatchNo (None, 32, 128, 32) 128
activation (Activation) (None, 32, 128, 32) 0
max_pooling2d (MaxPooling2D) (None, 16, 64, 32) 0
conv2d_1 (Conv2D) (None, 16, 64, 64) 51264
batch_normalization_1 (Batch (None, 16, 64, 64) 256
activation_1 (Activation) (None, 16, 64, 64) 0
max_pooling2d_1 (MaxPooling2 (None, 8, 32, 64) 0
conv2d_2 (Conv2D) (None, 8, 32, 128) 73856
batch_normalization_2 (Batch (None, 8, 32, 128) 512
activation_2 (Activation) (None, 8, 32, 128) 0
max_pooling2d_2 (MaxPooling2 (None, 8, 16, 128) 0
conv2d_3 (Conv2D) (None, 8, 16, 128) 147584
batch_normalization_3 (Batch (None, 8, 16, 128) 512
activation_3 (Activation) (None, 8, 16, 128) 0
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 128) 0
conv2d_4 (Conv2D) (None, 8, 8, 256) 295168
batch_normalization_4 (Batch (None, 8, 8, 256) 1024
activation_4 (Activation) (None, 8, 8, 256) 0
max_pooling2d_4 (MaxPooling2 (None, 8, 4, 256) 0
reshape (Reshape) (None, 32, 256) 0
bidirectional (Bidirectional (None, 32, 512) 1050624
bidirectional_1 (Bidirection (None, 32, 512) 1574912
reshape_1 (Reshape) (None, None, 32, 512) 0
conv2d_5 (Conv2D) (None, None, 32, 80) 41040
softmax (Softmax) (None, None, 32, 80) 0
Here is the tensorflow documentation I was referencing:
https://www.tensorflow.org/api_docs/python/tf/keras/backend/ctc_batch_cost

First, what are the input_length and label_length parameters?
input_length is the length of the input sequence in time steps. label_length is the length of the text label.
For example, if you are trying to recognize:
and you are doing it in 32 time steps, then your input_length is 32 and your label_length is 12 (len("John Hancock")).
Shouldn't the function already know the input_length and label_length from the shape of the first two parameters (y_true and y_pred)?
You usually process input data in batches, which have to be padded to the largest element in the batch, so this information is lost. In your case the input_length is always the same, but the label_length varies.
When dealing with speech recognition, for example, input_length can vary as well.
Secondly, do I need to encode my training data? Is this all done automatically?
Not sure I understand what you are asking, but here is a good example written in Keras:
https://keras.io/examples/image_ocr/
I know tensorflow also has a function called tf.keras.backend.ctc_decode. Is this only used when making predictions?
In general, yes. You can also try to use it make you breakfast in the morning, but it's not very good at it ;)

Related

Why building same model in 2 different ways give different outputs?

I'm having a really weird problem.
I'm building same model in 2 different ways.
I checked the summary (number of parameters) and plot the 2 models, and see no difference.
The models give different predictions (after train them on same dataset).
What is the difference in the models ? (I can't figure it out)
How can I update the second model to be same as the first model ?
first model (the "source" model):
input_img = Input(shape=(dim_x, dim_y, dim_z))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoder = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoder)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoder = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoder)
autoencoder.compile(optimizer='adam', loss=loss_func) Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
conv2d_28 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_29 (Conv2D) (None, 112, 112, 8) 1160
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 56, 56, 8) 0
_________________________________________________________________
conv2d_30 (Conv2D) (None, 56, 56, 8) 584
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 28, 28, 8) 0
_________________________________________________________________
conv2d_31 (Conv2D) (None, 28, 28, 8) 584
_________________________________________________________________
up_sampling2d_12 (UpSampling (None, 56, 56, 8) 0
_________________________________________________________________
conv2d_32 (Conv2D) (None, 56, 56, 8) 584
_________________________________________________________________
up_sampling2d_13 (UpSampling (None, 112, 112, 8) 0
_________________________________________________________________
conv2d_33 (Conv2D) (None, 112, 112, 16) 1168
_________________________________________________________________
up_sampling2d_14 (UpSampling (None, 224, 224, 16) 0
_________________________________________________________________
conv2d_34 (Conv2D) (None, 224, 224, 3) 435
=================================================================
Total params: 4,963
Trainable params: 4,963
Non-trainable params: 0
summary:
Layer (type) Output Shape Param #
=================================================================
conv2d_21 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_22 (Conv2D) (None, 112, 112, 8) 1160
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 56, 56, 8) 0
_________________________________________________________________
conv2d_23 (Conv2D) (None, 56, 56, 8) 584
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 28, 28, 8) 0
_________________________________________________________________
conv2d_24 (Conv2D) (None, 28, 28, 8) 584
_________________________________________________________________
up_sampling2d_9 (UpSampling2 (None, 56, 56, 8) 0
_________________________________________________________________
conv2d_25 (Conv2D) (None, 56, 56, 8) 584
_________________________________________________________________
up_sampling2d_10 (UpSampling (None, 112, 112, 8) 0
_________________________________________________________________
conv2d_26 (Conv2D) (None, 112, 112, 16) 1168
_________________________________________________________________
up_sampling2d_11 (UpSampling (None, 224, 224, 16) 0
_________________________________________________________________
conv2d_27 (Conv2D) (None, 224, 224, 3) 435
=================================================================
Total params: 4,963
Trainable params: 4,963
Non-trainable params: 0
Second model (The model I want to build as first model in different way):
autoencoder = Sequential()
autoencoder.add(el1)
autoencoder.add(el2)
autoencoder.add(el3)
autoencoder.add(el4)
autoencoder.add(el5)
autoencoder.add(el6)
autoencoder.add(dl1)
autoencoder.add(dl2)
autoencoder.add(dl3)
autoencoder.add(dl4)
autoencoder.add(dl5)
autoencoder.add(dl6)
autoencoder.add(output_layer)
autoencoder.compile(optimizer='adam', loss=loss_func)
summary:
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
conv2d_28 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_29 (Conv2D) (None, 112, 112, 8) 1160
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 56, 56, 8) 0
_________________________________________________________________
conv2d_30 (Conv2D) (None, 56, 56, 8) 584
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 28, 28, 8) 0
_________________________________________________________________
conv2d_31 (Conv2D) (None, 28, 28, 8) 584
_________________________________________________________________
up_sampling2d_12 (UpSampling (None, 56, 56, 8) 0
_________________________________________________________________
conv2d_32 (Conv2D) (None, 56, 56, 8) 584
_________________________________________________________________
up_sampling2d_13 (UpSampling (None, 112, 112, 8) 0
_________________________________________________________________
conv2d_33 (Conv2D) (None, 112, 112, 16) 1168
_________________________________________________________________
up_sampling2d_14 (UpSampling (None, 224, 224, 16) 0
_________________________________________________________________
conv2d_34 (Conv2D) (None, 224, 224, 3) 435
=================================================================
Total params: 4,963
Trainable params: 4,963
Non-trainable params: 0
You should set a random seed using tensorflow.set_random_seed(0) and numpy.random.seed(0). The seed can be any int or 1D array_like, and should be set in your code once.
Also make sure that you have shuffling disabled model.fit(data, shuffle=False)
After that a random weight/parameters initialization and data ordering will be reproduceable in consecutive experiments and models.
Although there still may be some randomness resulting in different results after running the model. It can be from other libraries that use other randomness modules. (eg.: mnist_cnn.py does not give reproducible results)

TF2.1: SegNet model architecture problem. Bug with metric calculation, keeps constant and converge to determined value

I'm building a custom model (SegNet) in Tensorflow 2.1.0.
The first problem I'm facing is the reutilization of the indices of the max pooling operation needed as described in the paper.
Basically, since it is an encoder-decoder architecture, the pooling indices, of the encoding section of the network, are needed in the decoding to upsample the feature maps and keep the values targeted by the corresponding indices.
Now, in TF these indices are not exported by default by the layer tf.keras.layers.MaxPool2D (as for example are in PyTorch).
To get the indices of the max pooling operation it is required to use tf.nn.max_pool_with_argmax.
This operation, anyway, returns the indices (argmax) in a flattened format, which requires further operations to be useful in other parts of the network.
To implement a layer that performs a MaxPooling2D and exports these indices (flattened) I defined a custom layer in keras.
class MaxPoolingWithArgmax2D(Layer):
def __init__(
self,
pool_size=(2, 2),
strides=2,
padding='same',
**kwargs):
super(MaxPoolingWithArgmax2D, self).__init__(**kwargs)
self.padding = padding
self.pool_size = pool_size
self.strides = strides
def call(self, inputs, **kwargs):
padding = self.padding
pool_size = self.pool_size
strides = self.strides
output, argmax = tf.nn.max_pool_with_argmax(
inputs,
ksize=pool_size,
strides=strides,
padding=padding.upper(),
output_dtype=tf.int64)
return output, argmax
Obviously, this layer is used in the encoding section of the network, hence a decoding respective layer is needed to perform the inverse operation (UpSampling2D), with the utilization of the indices (further details of this operation in the paper).
After some research, I found legacy code (TF<2.1.0) and adapted it to perform the operation.
Anyway I'm not 100% convinced this code works well, in fact there are some things I don't like.
class MaxUnpooling2D(Layer):
def __init__(self, size=(2, 2), **kwargs):
super(MaxUnpooling2D, self).__init__(**kwargs)
self.size = size
def call(self, inputs, output_shape=None):
updates, mask = inputs[0], inputs[1]
with tf.name_scope(self.name):
mask = tf.cast(mask, 'int32')
#input_shape = tf.shape(updates, out_type='int32')
input_shape = updates.get_shape()
# This statement is required if I don't want to specify a batch size
if input_shape[0] == None:
batches = 1
else:
batches = input_shape[0]
# calculation new shape
if output_shape is None:
output_shape = (
batches,
input_shape[1]*self.size[0],
input_shape[2]*self.size[1],
input_shape[3])
# calculation indices for batch, height, width and feature maps
one_like_mask = tf.ones_like(mask, dtype='int32')
batch_shape = tf.concat(
[[batches], [1], [1], [1]],
axis=0)
batch_range = tf.reshape(
tf.range(output_shape[0], dtype='int32'),
shape=batch_shape)
b = one_like_mask * batch_range
y = mask // (output_shape[2] * output_shape[3])
x = (mask // output_shape[3]) % output_shape[2]
feature_range = tf.range(output_shape[3], dtype='int32')
f = one_like_mask * feature_range
# transpose indices & reshape update values to one dimension
updates_size = tf.size(updates)
indices = tf.transpose(tf.reshape(
tf.stack([b, y, x, f]),
[4, updates_size]))
values = tf.reshape(updates, [updates_size])
ret = tf.scatter_nd(indices, values, output_shape)
return ret
The things that bother me are:
Performing the operation to unflatten the indices (MaxUnpooling2D) is strictly related to knowing a specific batch size, which for model validation I would like to be None or unspecified.
I am not sure this code is actually 100% compatible with the rest of the library. In fact during fit if I use tf.keras.metrics.MeanIoU the value converges to 0.341 and keeps constant for every other epoch than the first. Instead the standard accuracy metric works just fine.
Network architecture in Depth
Following, the complete definition of the model.
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
from tensorflow.keras.layers import Layer
class SegNet:
def __init__(self, data_shape, classes = 3, batch_size = None):
self.MODEL_NAME = 'SegNet'
self.MODEL_VERSION = '0.2'
self.classes = classes
self.batch_size = batch_size
self.build_model(data_shape)
def build_model(self, data_shape):
input_shape = (data_shape, data_shape, 3)
inputs = keras.Input(shape=input_shape, batch_size=self.batch_size, name='Input')
# Build sequential model
# Encoding
encoders = 5
feature_maps = [64, 128, 256, 512, 512]
n_convolutions = [2, 2, 3, 3, 3]
eb_input = inputs
eb_argmax_indices = []
for encoder_index in range(encoders):
encoder_block, argmax_indices = self.encoder_block(
eb_input, encoder_index, feature_maps[encoder_index], n_convolutions[encoder_index])
eb_argmax_indices.append(argmax_indices)
eb_input = encoder_block
# Decoding
decoders = encoders
db_input = encoder_block
eb_argmax_indices.reverse()
feature_maps.reverse()
n_convolutions.reverse()
d_feature_maps = [512, 512, 256, 128, 64]
d_n_convolutions = n_convolutions
for decoder_index in range(decoders):
decoder_block = self.decoder_block(
db_input, eb_argmax_indices[decoder_index], decoder_index, d_feature_maps[decoder_index], d_n_convolutions[decoder_index])
db_input = decoder_block
output = layers.Softmax()(decoder_block)
self.model = keras.Model(inputs=inputs, outputs=output, name="SegNet")
def encoder_block(self, x, encoder_index, feature_maps, n_convolutions):
bank_input = x
for conv_index in range(n_convolutions):
bank = self.eb_layers_bank(
bank_input, conv_index, feature_maps, encoder_index)
bank_input = bank
max_pool, indices = MaxPoolingWithArgmax2D(pool_size=(
2, 2), strides=2, padding='same', name='EB_{}_MPOOL'.format(encoder_index + 1))(bank)
return max_pool, indices
def eb_layers_bank(self, x, bank_index, feature_maps, encoder_index):
bank_input = x
conv_l = layers.Conv2D(feature_maps, (3, 3), padding='same', name='EB_{}_BANK_{}_CONV'.format(
encoder_index + 1, bank_index + 1))(bank_input)
batch_norm = layers.BatchNormalization(
name='EB_{}_BANK_{}_BN'.format(encoder_index + 1, bank_index + 1))(conv_l)
relu = layers.ReLU(name='EB_{}_BANK_{}_RL'.format(
encoder_index + 1, bank_index + 1))(batch_norm)
return relu
def decoder_block(self, x, max_pooling_idices, decoder_index, feature_maps, n_convolutions):
#bank_input = self.unpool_with_argmax(x, max_pooling_idices)
bank_input = MaxUnpooling2D(name='DB_{}_UPSAMP'.format(decoder_index + 1))([x, max_pooling_idices])
#bank_input = layers.UpSampling2D()(x)
for conv_index in range(n_convolutions):
if conv_index == n_convolutions - 1:
last_l_banck = True
else:
last_l_banck = False
bank = self.db_layers_bank(
bank_input, conv_index, feature_maps, decoder_index, last_l_banck)
bank_input = bank
return bank
def db_layers_bank(self, x, bank_index, feature_maps, decoder_index, last_l_bank):
bank_input = x
if (last_l_bank) & (decoder_index == 4):
conv_l = layers.Conv2D(self.classes, (1, 1), padding='same', name='DB_{}_BANK_{}_CONV'.format(
decoder_index + 1, bank_index + 1))(bank_input)
#batch_norm = layers.BatchNormalization(
# name='DB_{}_BANK_{}_BN'.format(decoder_index + 1, bank_index + 1))(conv_l)
return conv_l
else:
if (last_l_bank) & (decoder_index > 0):
conv_l = layers.Conv2D(int(feature_maps / 2), (3, 3), padding='same', name='DB_{}_BANK_{}_CONV'.format(
decoder_index + 1, bank_index + 1))(bank_input)
else:
conv_l = layers.Conv2D(feature_maps, (3, 3), padding='same', name='DB_{}_BANK_{}_CONV'.format(
decoder_index + 1, bank_index + 1))(bank_input)
batch_norm = layers.BatchNormalization(
name='DB_{}_BANK_{}_BN'.format(decoder_index + 1, bank_index + 1))(conv_l)
relu = layers.ReLU(name='DB_{}_BANK_{}_RL'.format(
decoder_index + 1, bank_index + 1))(batch_norm)
return relu
def get_model(self):
return self.model
Here the output of model.summary().
Model: "SegNet"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input (InputLayer) [(None, 416, 416, 3) 0
__________________________________________________________________________________________________
EB_1_BANK_1_CONV (Conv2D) (None, 416, 416, 64) 1792 Input[0][0]
__________________________________________________________________________________________________
EB_1_BANK_1_BN (BatchNormalizat (None, 416, 416, 64) 256 EB_1_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
EB_1_BANK_1_RL (ReLU) (None, 416, 416, 64) 0 EB_1_BANK_1_BN[0][0]
__________________________________________________________________________________________________
EB_1_BANK_2_CONV (Conv2D) (None, 416, 416, 64) 36928 EB_1_BANK_1_RL[0][0]
__________________________________________________________________________________________________
EB_1_BANK_2_BN (BatchNormalizat (None, 416, 416, 64) 256 EB_1_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
EB_1_BANK_2_RL (ReLU) (None, 416, 416, 64) 0 EB_1_BANK_2_BN[0][0]
__________________________________________________________________________________________________
EB_1_MPOOL (MaxPoolingWithArgma ((None, 208, 208, 64 0 EB_1_BANK_2_RL[0][0]
__________________________________________________________________________________________________
EB_2_BANK_1_CONV (Conv2D) (None, 208, 208, 128 73856 EB_1_MPOOL[0][0]
__________________________________________________________________________________________________
EB_2_BANK_1_BN (BatchNormalizat (None, 208, 208, 128 512 EB_2_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
EB_2_BANK_1_RL (ReLU) (None, 208, 208, 128 0 EB_2_BANK_1_BN[0][0]
__________________________________________________________________________________________________
EB_2_BANK_2_CONV (Conv2D) (None, 208, 208, 128 147584 EB_2_BANK_1_RL[0][0]
__________________________________________________________________________________________________
EB_2_BANK_2_BN (BatchNormalizat (None, 208, 208, 128 512 EB_2_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
EB_2_BANK_2_RL (ReLU) (None, 208, 208, 128 0 EB_2_BANK_2_BN[0][0]
__________________________________________________________________________________________________
EB_2_MPOOL (MaxPoolingWithArgma ((None, 104, 104, 12 0 EB_2_BANK_2_RL[0][0]
__________________________________________________________________________________________________
EB_3_BANK_1_CONV (Conv2D) (None, 104, 104, 256 295168 EB_2_MPOOL[0][0]
__________________________________________________________________________________________________
EB_3_BANK_1_BN (BatchNormalizat (None, 104, 104, 256 1024 EB_3_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
EB_3_BANK_1_RL (ReLU) (None, 104, 104, 256 0 EB_3_BANK_1_BN[0][0]
__________________________________________________________________________________________________
EB_3_BANK_2_CONV (Conv2D) (None, 104, 104, 256 590080 EB_3_BANK_1_RL[0][0]
__________________________________________________________________________________________________
EB_3_BANK_2_BN (BatchNormalizat (None, 104, 104, 256 1024 EB_3_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
EB_3_BANK_2_RL (ReLU) (None, 104, 104, 256 0 EB_3_BANK_2_BN[0][0]
__________________________________________________________________________________________________
EB_3_BANK_3_CONV (Conv2D) (None, 104, 104, 256 590080 EB_3_BANK_2_RL[0][0]
__________________________________________________________________________________________________
EB_3_BANK_3_BN (BatchNormalizat (None, 104, 104, 256 1024 EB_3_BANK_3_CONV[0][0]
__________________________________________________________________________________________________
EB_3_BANK_3_RL (ReLU) (None, 104, 104, 256 0 EB_3_BANK_3_BN[0][0]
__________________________________________________________________________________________________
EB_3_MPOOL (MaxPoolingWithArgma ((None, 52, 52, 256) 0 EB_3_BANK_3_RL[0][0]
__________________________________________________________________________________________________
EB_4_BANK_1_CONV (Conv2D) (None, 52, 52, 512) 1180160 EB_3_MPOOL[0][0]
__________________________________________________________________________________________________
EB_4_BANK_1_BN (BatchNormalizat (None, 52, 52, 512) 2048 EB_4_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
EB_4_BANK_1_RL (ReLU) (None, 52, 52, 512) 0 EB_4_BANK_1_BN[0][0]
__________________________________________________________________________________________________
EB_4_BANK_2_CONV (Conv2D) (None, 52, 52, 512) 2359808 EB_4_BANK_1_RL[0][0]
__________________________________________________________________________________________________
EB_4_BANK_2_BN (BatchNormalizat (None, 52, 52, 512) 2048 EB_4_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
EB_4_BANK_2_RL (ReLU) (None, 52, 52, 512) 0 EB_4_BANK_2_BN[0][0]
__________________________________________________________________________________________________
EB_4_BANK_3_CONV (Conv2D) (None, 52, 52, 512) 2359808 EB_4_BANK_2_RL[0][0]
__________________________________________________________________________________________________
EB_4_BANK_3_BN (BatchNormalizat (None, 52, 52, 512) 2048 EB_4_BANK_3_CONV[0][0]
__________________________________________________________________________________________________
EB_4_BANK_3_RL (ReLU) (None, 52, 52, 512) 0 EB_4_BANK_3_BN[0][0]
__________________________________________________________________________________________________
EB_4_MPOOL (MaxPoolingWithArgma ((None, 26, 26, 512) 0 EB_4_BANK_3_RL[0][0]
__________________________________________________________________________________________________
EB_5_BANK_1_CONV (Conv2D) (None, 26, 26, 512) 2359808 EB_4_MPOOL[0][0]
__________________________________________________________________________________________________
EB_5_BANK_1_BN (BatchNormalizat (None, 26, 26, 512) 2048 EB_5_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
EB_5_BANK_1_RL (ReLU) (None, 26, 26, 512) 0 EB_5_BANK_1_BN[0][0]
__________________________________________________________________________________________________
EB_5_BANK_2_CONV (Conv2D) (None, 26, 26, 512) 2359808 EB_5_BANK_1_RL[0][0]
__________________________________________________________________________________________________
EB_5_BANK_2_BN (BatchNormalizat (None, 26, 26, 512) 2048 EB_5_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
EB_5_BANK_2_RL (ReLU) (None, 26, 26, 512) 0 EB_5_BANK_2_BN[0][0]
__________________________________________________________________________________________________
EB_5_BANK_3_CONV (Conv2D) (None, 26, 26, 512) 2359808 EB_5_BANK_2_RL[0][0]
__________________________________________________________________________________________________
EB_5_BANK_3_BN (BatchNormalizat (None, 26, 26, 512) 2048 EB_5_BANK_3_CONV[0][0]
__________________________________________________________________________________________________
EB_5_BANK_3_RL (ReLU) (None, 26, 26, 512) 0 EB_5_BANK_3_BN[0][0]
__________________________________________________________________________________________________
EB_5_MPOOL (MaxPoolingWithArgma ((None, 13, 13, 512) 0 EB_5_BANK_3_RL[0][0]
__________________________________________________________________________________________________
DB_1_UPSAMP (MaxUnpooling2D) (1, 26, 26, 512) 0 EB_5_MPOOL[0][0]
EB_5_MPOOL[0][1]
__________________________________________________________________________________________________
DB_1_BANK_1_CONV (Conv2D) (1, 26, 26, 512) 2359808 DB_1_UPSAMP[0][0]
__________________________________________________________________________________________________
DB_1_BANK_1_BN (BatchNormalizat (1, 26, 26, 512) 2048 DB_1_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
DB_1_BANK_1_RL (ReLU) (1, 26, 26, 512) 0 DB_1_BANK_1_BN[0][0]
__________________________________________________________________________________________________
DB_1_BANK_2_CONV (Conv2D) (1, 26, 26, 512) 2359808 DB_1_BANK_1_RL[0][0]
__________________________________________________________________________________________________
DB_1_BANK_2_BN (BatchNormalizat (1, 26, 26, 512) 2048 DB_1_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
DB_1_BANK_2_RL (ReLU) (1, 26, 26, 512) 0 DB_1_BANK_2_BN[0][0]
__________________________________________________________________________________________________
DB_1_BANK_3_CONV (Conv2D) (1, 26, 26, 512) 2359808 DB_1_BANK_2_RL[0][0]
__________________________________________________________________________________________________
DB_1_BANK_3_BN (BatchNormalizat (1, 26, 26, 512) 2048 DB_1_BANK_3_CONV[0][0]
__________________________________________________________________________________________________
DB_1_BANK_3_RL (ReLU) (1, 26, 26, 512) 0 DB_1_BANK_3_BN[0][0]
__________________________________________________________________________________________________
DB_2_UPSAMP (MaxUnpooling2D) (1, 52, 52, 512) 0 DB_1_BANK_3_RL[0][0]
EB_4_MPOOL[0][1]
__________________________________________________________________________________________________
DB_2_BANK_1_CONV (Conv2D) (1, 52, 52, 512) 2359808 DB_2_UPSAMP[0][0]
__________________________________________________________________________________________________
DB_2_BANK_1_BN (BatchNormalizat (1, 52, 52, 512) 2048 DB_2_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
DB_2_BANK_1_RL (ReLU) (1, 52, 52, 512) 0 DB_2_BANK_1_BN[0][0]
__________________________________________________________________________________________________
DB_2_BANK_2_CONV (Conv2D) (1, 52, 52, 512) 2359808 DB_2_BANK_1_RL[0][0]
__________________________________________________________________________________________________
DB_2_BANK_2_BN (BatchNormalizat (1, 52, 52, 512) 2048 DB_2_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
DB_2_BANK_2_RL (ReLU) (1, 52, 52, 512) 0 DB_2_BANK_2_BN[0][0]
__________________________________________________________________________________________________
DB_2_BANK_3_CONV (Conv2D) (1, 52, 52, 256) 1179904 DB_2_BANK_2_RL[0][0]
__________________________________________________________________________________________________
DB_2_BANK_3_BN (BatchNormalizat (1, 52, 52, 256) 1024 DB_2_BANK_3_CONV[0][0]
__________________________________________________________________________________________________
DB_2_BANK_3_RL (ReLU) (1, 52, 52, 256) 0 DB_2_BANK_3_BN[0][0]
__________________________________________________________________________________________________
DB_3_UPSAMP (MaxUnpooling2D) (1, 104, 104, 256) 0 DB_2_BANK_3_RL[0][0]
EB_3_MPOOL[0][1]
__________________________________________________________________________________________________
DB_3_BANK_1_CONV (Conv2D) (1, 104, 104, 256) 590080 DB_3_UPSAMP[0][0]
__________________________________________________________________________________________________
DB_3_BANK_1_BN (BatchNormalizat (1, 104, 104, 256) 1024 DB_3_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
DB_3_BANK_1_RL (ReLU) (1, 104, 104, 256) 0 DB_3_BANK_1_BN[0][0]
__________________________________________________________________________________________________
DB_3_BANK_2_CONV (Conv2D) (1, 104, 104, 256) 590080 DB_3_BANK_1_RL[0][0]
__________________________________________________________________________________________________
DB_3_BANK_2_BN (BatchNormalizat (1, 104, 104, 256) 1024 DB_3_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
DB_3_BANK_2_RL (ReLU) (1, 104, 104, 256) 0 DB_3_BANK_2_BN[0][0]
__________________________________________________________________________________________________
DB_3_BANK_3_CONV (Conv2D) (1, 104, 104, 128) 295040 DB_3_BANK_2_RL[0][0]
__________________________________________________________________________________________________
DB_3_BANK_3_BN (BatchNormalizat (1, 104, 104, 128) 512 DB_3_BANK_3_CONV[0][0]
__________________________________________________________________________________________________
DB_3_BANK_3_RL (ReLU) (1, 104, 104, 128) 0 DB_3_BANK_3_BN[0][0]
__________________________________________________________________________________________________
DB_4_UPSAMP (MaxUnpooling2D) (1, 208, 208, 128) 0 DB_3_BANK_3_RL[0][0]
EB_2_MPOOL[0][1]
__________________________________________________________________________________________________
DB_4_BANK_1_CONV (Conv2D) (1, 208, 208, 128) 147584 DB_4_UPSAMP[0][0]
__________________________________________________________________________________________________
DB_4_BANK_1_BN (BatchNormalizat (1, 208, 208, 128) 512 DB_4_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
DB_4_BANK_1_RL (ReLU) (1, 208, 208, 128) 0 DB_4_BANK_1_BN[0][0]
__________________________________________________________________________________________________
DB_4_BANK_2_CONV (Conv2D) (1, 208, 208, 64) 73792 DB_4_BANK_1_RL[0][0]
__________________________________________________________________________________________________
DB_4_BANK_2_BN (BatchNormalizat (1, 208, 208, 64) 256 DB_4_BANK_2_CONV[0][0]
__________________________________________________________________________________________________
DB_4_BANK_2_RL (ReLU) (1, 208, 208, 64) 0 DB_4_BANK_2_BN[0][0]
__________________________________________________________________________________________________
DB_5_UPSAMP (MaxUnpooling2D) (1, 416, 416, 64) 0 DB_4_BANK_2_RL[0][0]
EB_1_MPOOL[0][1]
__________________________________________________________________________________________________
DB_5_BANK_1_CONV (Conv2D) (1, 416, 416, 64) 36928 DB_5_UPSAMP[0][0]
__________________________________________________________________________________________________
DB_5_BANK_1_BN (BatchNormalizat (1, 416, 416, 64) 256 DB_5_BANK_1_CONV[0][0]
__________________________________________________________________________________________________
DB_5_BANK_1_RL (ReLU) (1, 416, 416, 64) 0 DB_5_BANK_1_BN[0][0]
__________________________________________________________________________________________________
DB_5_BANK_2_CONV (Conv2D) (1, 416, 416, 3) 195 DB_5_BANK_1_RL[0][0]
__________________________________________________________________________________________________
softmax (Softmax) (1, 416, 416, 3) 0 DB_5_BANK_2_CONV[0][0]
==================================================================================================
Total params: 29,459,075
Trainable params: 29,443,203
Non-trainable params: 15,872
__________________________________________________________________________________________________
As you can see, I'm forced to specify a batch size in the MaxUnpooling2D otherwise I get errors that the operation can not be performed since there are None values and shapes can not be correctly transformed.
When I try to predict an image, I'm forced to specify the correct batch dimension, otherwise I get errors like:
InvalidArgumentError: Shapes of all inputs must match: values[0].shape = [4,208,208,64] != values[1].shape = [1,208,208,64]
[[{{node SegNet/DB_5_UPSAMP/PartitionedCall/PartitionedCall/DB_5_UPSAMP/stack}}]] [Op:__inference_predict_function_70839]
Which is caused by the implementation required to unravel the indices from the max pooling operation.
Training graphs
Here is a reference with a training on 20 epochs.
As you can see the MeanIoU metric is linear, no progress, no updates other than in epoch 1.
The other metric works fine, and loss decrease correctly.
––––––––––
Conclusions
There is a better way, more compatible with recent versions of TF, to implement the unraveling and upsampling with indices from the max pooling operation?
If the implementation is correct, why I get a metric stuck at a specific value? Am I doing something wrong in the model?
Thank you!
You can have reshapes with unknown batch size in custom layers in two ways.
If you know the rest of the shape, reshape using -1 as the batch size:
Suppose you know the size of your expected array:
import tensorflow.keras.backend as K
reshaped = K.reshape(original, (-1, x, y, channels))
Suppose you don't know the size, then use K.shape to get the shape as a tensor:
inputs_shape = K.shape(inputs)
batch_size = inputs_shape[:1]
x = inputs_shape[1:2]
y = inputs_shape[2:3]
ch = inputs_shape[3:]
#you can then concatenate these and operate them (notice I kept them as 1D vector, not as scalar)
newShape = K.concatenate([batch_size, x, y, ch]) #of course you will make your operations
Once I did my own version of a Segnet, I didn't use indices, but kept a one hot version. It's true that it takes extra operations, but it might work well:
def get_indices(original, unpooled):
is_equal = K.equal(original, unpooled)
return K.cast(is_equal, K.floatx())
previous_output = ...
pooled = MaxPooling2D()(previous_output)
unpooled = UpSampling2D()(pooled)
one_hot_indices = Lambda(get_indices)([previous_output, unpooled])
Then after an upsampling, I concatenate these indices and pass a new conv:
some_output = ...
upsampled = UpSampling2D()(some_output)
with_indices = Concatenate([upsampled, one_hot_indices])
upsampled = Conv2D(...)(with_indices)

conv-autoencoder that val_loss doesn't decrease

I build a anomaly detection model using conv-autoencoder on UCSD_ped2 dataset. What puzzles me is that after very few epochs ,the val_loss don't decrease. It seem that the model couldn't learn any longer. I have done some research to improve my model,but it doesn't getting better. what should i do to fix it?
Here's my model's struct:
x=144;y=224
input_img = Input(shape = (x, y, inChannel))
bn1= BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(input_img)
conv1 = Conv2D(256, (11, 11), strides=(4,4),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(bn1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
bn2= BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(pool1)
conv2 = Conv2D(128, (5, 5),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(bn2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
bn3= BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(pool2)
conv3 = Conv2D(64, (3, 3), activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(bn3)
ubn3=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(conv3)
uconv3=Conv2DTranspose(128, (3,3),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(ubn3)
upool3=UpSampling2D(size=(2, 2))(uconv3)
ubn2=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(upool3)
uconv2=Conv2DTranspose(256, (3, 3),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(ubn2)
upool2=UpSampling2D(size=(2, 2))(uconv2)
ubn1=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(upool2)
decoded = Conv2DTranspose(1, (11, 11), strides=(4, 4),
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
activation='sigmoid', padding='same')(ubn1)
autoencoder = Model(input_img, decoded)
autoencoder.compile(loss = 'mean_squared_error', optimizer ='Adadelta',metrics=['accuracy'])
history=autoencoder.fit(X_train, Y_train,validation_split=0.3,
batch_size = batch_size, epochs = epochs, verbose = 0,
shuffle=True,
callbacks=[earlystopping,checkpointer,reduce_lr])
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 144, 224, 1) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 144, 224, 1) 4
_________________________________________________________________
conv2d_1 (Conv2D) (None, 36, 56, 256) 31232
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 18, 28, 256) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 18, 28, 256) 1024
_________________________________________________________________
conv2d_2 (Conv2D) (None, 18, 28, 128) 819328
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 9, 14, 128) 0
_________________________________________________________________
batch_normalization_3 (Batch (None, 9, 14, 128) 512
_________________________________________________________________
conv2d_3 (Conv2D) (None, 9, 14, 64) 73792
_________________________________________________________________
batch_normalization_4 (Batch (None, 9, 14, 64) 256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 9, 14, 128) 73856
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 18, 28, 128) 0
_________________________________________________________________
batch_normalization_5 (Batch (None, 18, 28, 128) 512
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 18, 28, 256) 295168
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 36, 56, 256) 0
_________________________________________________________________
batch_normalization_6 (Batch (None, 36, 56, 256) 1024
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 144, 224, 1) 30977
=================================================================
Total params: 1,327,685
Trainable params: 1,326,019
Non-trainable params: 1,666
the batch size=30;epoch=100 training data has 1785 pic; validation data has 765 pic.
I have tried :
delete kernel_regularizer;
adding ReduceLROnPlateau.
,but it only get a little improve.
Epoch 00043: ReduceLROnPlateau reducing learning rate to 9.99999874573554e-12.
Epoch 00044: val_loss did not improve from 0.00240
Epoch 00045: val_loss did not improve from 0.00240
As the val_loss get 0.00240, it didn't decrease...
The following figure was loss with epoch.
The following figure show model's reconstruction result which are truly poor.How can I making my model more workful?
Based on your screenshot, It seems that it is not an issue of overfitting or underfitting.
On my understanding:
Underfitting – Validation and training error high
Overfitting – Validation error is high, training error low
Good fit – Validation error low, slightly higher than the training error
Generally speaking, the dataset should be split properly for training and validation.
Typically the training set should be 4 times (80/20) the number of your validation set.
My suggestion is that you can try to increase the number of your datasets by doing data augmentation and continue the training.
Kindly refer to the documentation for data augmentation.

Sci-kit Learn Confusion Matrix: Found input variables with inconsistent numbers of samples

I'm trying to plot a confusion matrix between the predicted test labels and the actual ones, but I'm getting this error
ValueError: Found input variables with inconsistent numbers of samples: [1263, 12630]
Dataset: GTSRB
Code used
Image augmentation
train_datagen = ImageDataGenerator(rescale=1./255,
rotation_range=20,
horizontal_flip=True,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.01,
zoom_range=[0.9, 1.25],
brightness_range=[0.5, 1.5])
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator and test_generator
batch_size = 10
train_generator = train_datagen.flow_from_directory(
directory=train_path,
target_size=(224, 224),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=True,
seed=42
)
test_generator = test_datagen.flow_from_directory(
directory=test_path,
target_size=(224, 224),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=False,
seed=42
)
Output of that code
Found 39209 images belonging to 43 classes.
Found 12630 images belonging to 43 classes.
Then, I used a VGG-16 model and replaced the latest Dense layer with a Dense(43, activation='softmax')
Model summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
predictions (Dense) (None, 1000) 4097000
_________________________________________________________________
dense_1 (Dense) (None, 43) 43043
=================================================================
Total params: 138,400,587
Trainable params: 43,043
Non-trainable params: 138,357,544
_________________________________________________________________
Compile the model
my_sgd = SGD(lr=0.01)
model.compile(
optimizer=my_sgd,
loss='categorical_crossentropy',
metrics=['accuracy']
)
Train the model
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
epochs=10
model.fit_generator(generator=train_generator,
steps_per_epoch=STEP_SIZE_TRAIN,
epochs=epochs,
verbose=1
)
Predictions
STEP_SIZE_TEST=test_generator.n//test_generator.batch_size
test_generator.reset()
predictions = model.predict_generator(test_generator, steps=STEP_SIZE_TEST, verbose=1)
Output
1263/1263 [==============================] - 229s 181ms/step
Predictions shape
print(predictions.shape)
(12630, 43)
Getting the test_data and test_labels
test_data = []
test_labels = []
batch_index = 0
while batch_index <= test_generator.batch_index:
data = next(test_generator)
test_data.append(data[0])
test_labels.append(data[1])
batch_index = batch_index + 1
test_data_array = np.asarray(test_data)
test_labels_array = np.asarray(test_labels)
Shape of test_data_array and test_labels_array
test_data_array.shape
(1263, 10, 224, 224, 3)
test_labels_array.shape
(1263, 10, 43)
Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(test_labels_array, predictions)
I get the output
ValueError: Found input variables with inconsistent numbers of samples: [1263, 12630]
I understand that this error is because the test_labels_array size isn't equal to the predictions; 1263 and 12630 respectively, but I don't really know what I'm doing wrong.
Any help would be much appreciated.
PS: If anyone has any tips on how to increase the training accuracy while we're at it, that would be brilliant.
Thanks!
You should reshape test_data_array and test_labels_array as follows:
data_count, batch_count, w, h, c = test_data_array.shape
test_data_array=np.reshape(test_data_array, (data_count*batch_count, w, h, c))
test_labels_array = np.reshape(test_labels_array , (data_count*batch_count, -1))
the way you are appending the results of test_generator is the reason. In fact the first call of your test_generator will generate 10 data with shape of (224, 224, 3). For the next call again your test_generator will generate 10 data with shape of (224, 224, 3). So now you should have 20 data of shape (224, 224, 3) while the way you are appending the results would cause that you came up with 2 data of shape (10, 224, 224, 3). which is not what you are expecting.

Improving accuracy of my CNN for pixel wise segmentation

I am trying to design a CNN that can do pixel wise segmentation of cell images. Such as these:
With segmentation masks such as this (except more than one segmentation mask for each raw image, eg: interior of cell, border of cell, background):
I have mostly copied the U-net design from here: https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
However even 10 annotated images (over 300 cells) I still get quite bad dice coefficient scores and not great predictions. According to the U-Net paper this number of annotated cells should be sufficient for a good prediction.
This is the code for the model I am using.
def get_unet():
inputs = Input((img_rows, img_cols, 1))
conv1 = Conv2D(16, window_size, activation='relu', padding='same')(inputs)
conv1 = Conv2D(16, window_size, activation='relu', padding='same')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(64, window_size, activation='relu', padding='same')(pool1)
conv2 = Conv2D(64, window_size, activation='relu', padding='same')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(128, window_size, activation='relu', padding='same')(pool2)
conv3 = Conv2D(128, window_size, activation='relu', padding='same')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(128, window_size, activation='relu', padding='same')(pool3)
conv4 = Conv2D(128, window_size, activation='relu', padding='same')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Conv2D(512, window_size, activation='relu', padding='same')(pool4)
conv5 = Conv2D(512, window_size, activation='relu', padding='same')(conv5)
up6 = concatenate([Conv2DTranspose(512, (2, 2), strides=(2, 2), padding='same')(conv5), conv4], axis=3)
conv6 = Conv2D(128, window_size, activation='relu', padding='same')(up6)
conv6 = Conv2D(128, window_size, activation='relu', padding='same')(conv6)
up7 = concatenate([Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv6), conv3], axis=3)
conv7 = Conv2D(128, window_size, activation='relu', padding='same')(up7)
conv7 = Conv2D(128, window_size, activation='relu', padding='same')(conv7)
up8 = concatenate([Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(conv7), conv2], axis=3)
conv8 = Conv2D(64, window_size, activation='relu', padding='same')(up8)
conv8 = Conv2D(64, window_size, activation='relu', padding='same')(conv8)
up9 = concatenate([Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(conv8), conv1], axis=3)
conv9 = Conv2D(16, window_size, activation='relu', padding='same')(up9)
conv9 = Conv2D(16, window_size, activation='relu', padding='same')(conv9)
conv10 = Conv2D(f_num, (1, 1), activation='softmax')(conv9) # change to N,(1,1) for more classes and softmax
model = Model(inputs=[inputs], outputs=[conv10])
model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
return model`
I have tried many different hyper-parameters for the model all with no success. Dice scores hover around 0.25 and my loss barely decreases between epochs.
I feel I am doing something fundamentally wrong here. Any suggestions?
EDIT: Sigmoid activation improves dice score from 0.25 to 0.33 (again however 1 epoch reaches this score and subsequent epochs only improve very slightly from 0.33 to 0.331 etc)
dice_coef_loss is defined as below
smooth = 1.
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
def dice_coef_loss(y_true, y_pred):
return -dice_coef(y_true, y_pred)
Also in case it's useful the model.summary output:
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 64, 64, 1) 0
_________________________________________________________________
conv2d_20 (Conv2D) (None, 64, 64, 16) 32
_________________________________________________________________
conv2d_21 (Conv2D) (None, 64, 64, 16) 272
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 32, 32, 16) 0
_________________________________________________________________
conv2d_22 (Conv2D) (None, 32, 32, 64) 1088
_________________________________________________________________
conv2d_23 (Conv2D) (None, 32, 32, 64) 4160
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 16, 16, 64) 0
_________________________________________________________________
conv2d_24 (Conv2D) (None, 16, 16, 128) 8320
_________________________________________________________________
conv2d_25 (Conv2D) (None, 16, 16, 128) 16512
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 8, 8, 128) 0
_________________________________________________________________
conv2d_26 (Conv2D) (None, 8, 8, 128) 16512
_________________________________________________________________
conv2d_27 (Conv2D) (None, 8, 8, 128) 16512
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 4, 4, 128) 0
_________________________________________________________________
conv2d_28 (Conv2D) (None, 4, 4, 512) 66048
_________________________________________________________________
conv2d_29 (Conv2D) (None, 4, 4, 512) 262656
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 8, 8, 512) 1049088
_________________________________________________________________
concatenate_5 (Concatenate) (None, 8, 8, 640) 0
_________________________________________________________________
conv2d_30 (Conv2D) (None, 8, 8, 128) 82048
_________________________________________________________________
conv2d_31 (Conv2D) (None, 8, 8, 128) 16512
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 16, 16, 128) 65664
_________________________________________________________________
concatenate_6 (Concatenate) (None, 16, 16, 256) 0
_________________________________________________________________
conv2d_32 (Conv2D) (None, 16, 16, 128) 32896
_________________________________________________________________
conv2d_33 (Conv2D) (None, 16, 16, 128) 16512
_________________________________________________________________
conv2d_transpose_7 (Conv2DTr (None, 32, 32, 128) 65664
_________________________________________________________________
concatenate_7 (Concatenate) (None, 32, 32, 192) 0
_________________________________________________________________
conv2d_34 (Conv2D) (None, 32, 32, 64) 12352
_________________________________________________________________
conv2d_35 (Conv2D) (None, 32, 32, 64) 4160
_________________________________________________________________
conv2d_transpose_8 (Conv2DTr (None, 64, 64, 64) 16448
_________________________________________________________________
concatenate_8 (Concatenate) (None, 64, 64, 80) 0
_________________________________________________________________
conv2d_36 (Conv2D) (None, 64, 64, 16) 1296
_________________________________________________________________
conv2d_37 (Conv2D) (None, 64, 64, 16) 272
_________________________________________________________________
conv2d_38 (Conv2D) (None, 64, 64, 4) 68
=================================================================
Total params: 1,755,092.0
Trainable params: 1,755,092.0
Non-trainable params: 0.0