conv-autoencoder that val_loss doesn't decrease - tensorflow

I build a anomaly detection model using conv-autoencoder on UCSD_ped2 dataset. What puzzles me is that after very few epochs ,the val_loss don't decrease. It seem that the model couldn't learn any longer. I have done some research to improve my model,but it doesn't getting better. what should i do to fix it?
Here's my model's struct:
x=144;y=224
input_img = Input(shape = (x, y, inChannel))
bn1= BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(input_img)
conv1 = Conv2D(256, (11, 11), strides=(4,4),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(bn1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
bn2= BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(pool1)
conv2 = Conv2D(128, (5, 5),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(bn2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
bn3= BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(pool2)
conv3 = Conv2D(64, (3, 3), activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(bn3)
ubn3=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(conv3)
uconv3=Conv2DTranspose(128, (3,3),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(ubn3)
upool3=UpSampling2D(size=(2, 2))(uconv3)
ubn2=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(upool3)
uconv2=Conv2DTranspose(256, (3, 3),activation='relu',
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
padding='same')(ubn2)
upool2=UpSampling2D(size=(2, 2))(uconv2)
ubn1=BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001)(upool2)
decoded = Conv2DTranspose(1, (11, 11), strides=(4, 4),
kernel_regularizer=regularizers.l2(0.0005),
kernel_initializer=initializers.glorot_normal(seed=None),
activation='sigmoid', padding='same')(ubn1)
autoencoder = Model(input_img, decoded)
autoencoder.compile(loss = 'mean_squared_error', optimizer ='Adadelta',metrics=['accuracy'])
history=autoencoder.fit(X_train, Y_train,validation_split=0.3,
batch_size = batch_size, epochs = epochs, verbose = 0,
shuffle=True,
callbacks=[earlystopping,checkpointer,reduce_lr])
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 144, 224, 1) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 144, 224, 1) 4
_________________________________________________________________
conv2d_1 (Conv2D) (None, 36, 56, 256) 31232
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 18, 28, 256) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 18, 28, 256) 1024
_________________________________________________________________
conv2d_2 (Conv2D) (None, 18, 28, 128) 819328
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 9, 14, 128) 0
_________________________________________________________________
batch_normalization_3 (Batch (None, 9, 14, 128) 512
_________________________________________________________________
conv2d_3 (Conv2D) (None, 9, 14, 64) 73792
_________________________________________________________________
batch_normalization_4 (Batch (None, 9, 14, 64) 256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 9, 14, 128) 73856
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 18, 28, 128) 0
_________________________________________________________________
batch_normalization_5 (Batch (None, 18, 28, 128) 512
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 18, 28, 256) 295168
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 36, 56, 256) 0
_________________________________________________________________
batch_normalization_6 (Batch (None, 36, 56, 256) 1024
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 144, 224, 1) 30977
=================================================================
Total params: 1,327,685
Trainable params: 1,326,019
Non-trainable params: 1,666
the batch size=30;epoch=100 training data has 1785 pic; validation data has 765 pic.
I have tried :
delete kernel_regularizer;
adding ReduceLROnPlateau.
,but it only get a little improve.
Epoch 00043: ReduceLROnPlateau reducing learning rate to 9.99999874573554e-12.
Epoch 00044: val_loss did not improve from 0.00240
Epoch 00045: val_loss did not improve from 0.00240
As the val_loss get 0.00240, it didn't decrease...
The following figure was loss with epoch.
The following figure show model's reconstruction result which are truly poor.How can I making my model more workful?

Based on your screenshot, It seems that it is not an issue of overfitting or underfitting.
On my understanding:
Underfitting – Validation and training error high
Overfitting – Validation error is high, training error low
Good fit – Validation error low, slightly higher than the training error
Generally speaking, the dataset should be split properly for training and validation.
Typically the training set should be 4 times (80/20) the number of your validation set.
My suggestion is that you can try to increase the number of your datasets by doing data augmentation and continue the training.
Kindly refer to the documentation for data augmentation.

Related

Regarding Convolutional Neural Network

Hi wish to enquire some help regarding neural networks, i am doing a school project whereby i am required to build deep fake detection neural network. I am unsure on why by adding more layers into the neural. My Accuracy during training goes from 0.7 in the first epoch and jumps to 1.0 in the second to fifth epoch which is overfittin and the loss value goes to a weird number, Wish to seek advice on how i could adjust the neural network to suit deepfake detections.
Thank you all for the time in reading
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, Dropout
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape = (256,256,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
#flatten the layer conv 2d dense is 1d data set
model.add(Flatten()) #convets 3d feature maps to 1D feature Vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])
model.fit(X, y, batch_size=32, epochs=5)
Model Summary
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 254, 254, 32) 896
_________________________________________________________________
activation (Activation) (None, 254, 254, 32) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 127, 127, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 125, 125, 64) 18496
_________________________________________________________________
activation_1 (Activation) (None, 125, 125, 64) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 62, 62, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 60, 60, 64) 36928
_________________________________________________________________
activation_2 (Activation) (None, 60, 60, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 60, 60, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 58, 58, 64) 36928
_________________________________________________________________
activation_3 (Activation) (None, 58, 58, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 58, 58, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 56, 56, 64) 36928
_________________________________________________________________
activation_4 (Activation) (None, 56, 56, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 56, 56, 64) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 54, 54, 64) 36928
_________________________________________________________________
activation_5 (Activation) (None, 54, 54, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 186624) 0
_________________________________________________________________
dense (Dense) (None, 64) 11944000
_________________________________________________________________
activation_6 (Activation) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 65
_________________________________________________________________
activation_7 (Activation) (None, 1) 0
=================================================================
Total params: 12,111,169
Trainable params: 12,111,169
Non-trainable params: 0
_________________________________________________________________
You have to specify more stuff inside each layer, not only the size and number of filters. This will help you to increase the model performance.
For example, you could use adam from keras_optimizers, which will help to increase the accuracy during training the model. Also, l2 from keras.regularizers will help you to reduce overfitting. Which means you can't increase the accuracy just by increasing the epochs, you must first build a good model before starting the training

How do you use tensorflow ctc_batch_cost function with keras?

I have been trying to implement a CTC loss function in keras for several days now.
Unfortunately, I have yet to find a simple way to do this that fits well with keras. I found tensorflow's tf.keras.backend.ctc_batch_cost function but there is not much documentation on it. I am confused about a few things. First, what are the input_length and label_length parameters? I am trying to make a handwriting recognition model and my images are 32x128, my RNN has 32 time steps, and my character list has a length of 80. I have tried to use 32 for both parameters and this gives me the error below.
Shouldn't the function already know the input_length and label_length from the shape of the first two parameters (y_true and y_pred)?
Secondly, do I need to encode my training data? Is this all done automatically?
I know tensorflow also has a function called tf.keras.backend.ctc_decode. Is this only used when making predictions?
def ctc_cost(y_true, y_pred):
return tf.keras.backend.ctc_batch_cost(
y_true, y_pred, 32, 32)
model = tf.keras.Sequential([
layers.Conv2D(32, 5, padding="SAME", input_shape=(32, 128, 1)),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D(2, 2),
layers.Conv2D(64, 5, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D(2, 2),
layers.Conv2D(128, 3, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D((1, 2), (1, 2)),
layers.Conv2D(128, 3, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D((1, 2), (1, 2)),
layers.Conv2D(256, 3, padding="SAME"),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.MaxPool2D((1, 2), (1, 2)),
layers.Reshape((32, 256)),
layers.Bidirectional(layers.LSTM(256, return_sequences=True)),
layers.Bidirectional(layers.LSTM(256, return_sequences=True)),
layers.Reshape((-1, 32, 512)),
layers.Conv2D(80, 1, padding="SAME"),
layers.Softmax(-1)
])
print(model.summary())
model.compile(tf.optimizers.RMSprop(0.001), ctc_cost)
Error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: squeeze_dims[0] not in [0,0). for 'loss/softmax_loss/Squeeze' (op: 'Squeeze') with input shapes: []
Model:
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 128, 32) 832
batch_normalization (BatchNo (None, 32, 128, 32) 128
activation (Activation) (None, 32, 128, 32) 0
max_pooling2d (MaxPooling2D) (None, 16, 64, 32) 0
conv2d_1 (Conv2D) (None, 16, 64, 64) 51264
batch_normalization_1 (Batch (None, 16, 64, 64) 256
activation_1 (Activation) (None, 16, 64, 64) 0
max_pooling2d_1 (MaxPooling2 (None, 8, 32, 64) 0
conv2d_2 (Conv2D) (None, 8, 32, 128) 73856
batch_normalization_2 (Batch (None, 8, 32, 128) 512
activation_2 (Activation) (None, 8, 32, 128) 0
max_pooling2d_2 (MaxPooling2 (None, 8, 16, 128) 0
conv2d_3 (Conv2D) (None, 8, 16, 128) 147584
batch_normalization_3 (Batch (None, 8, 16, 128) 512
activation_3 (Activation) (None, 8, 16, 128) 0
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 128) 0
conv2d_4 (Conv2D) (None, 8, 8, 256) 295168
batch_normalization_4 (Batch (None, 8, 8, 256) 1024
activation_4 (Activation) (None, 8, 8, 256) 0
max_pooling2d_4 (MaxPooling2 (None, 8, 4, 256) 0
reshape (Reshape) (None, 32, 256) 0
bidirectional (Bidirectional (None, 32, 512) 1050624
bidirectional_1 (Bidirection (None, 32, 512) 1574912
reshape_1 (Reshape) (None, None, 32, 512) 0
conv2d_5 (Conv2D) (None, None, 32, 80) 41040
softmax (Softmax) (None, None, 32, 80) 0
Here is the tensorflow documentation I was referencing:
https://www.tensorflow.org/api_docs/python/tf/keras/backend/ctc_batch_cost
First, what are the input_length and label_length parameters?
input_length is the length of the input sequence in time steps. label_length is the length of the text label.
For example, if you are trying to recognize:
and you are doing it in 32 time steps, then your input_length is 32 and your label_length is 12 (len("John Hancock")).
Shouldn't the function already know the input_length and label_length from the shape of the first two parameters (y_true and y_pred)?
You usually process input data in batches, which have to be padded to the largest element in the batch, so this information is lost. In your case the input_length is always the same, but the label_length varies.
When dealing with speech recognition, for example, input_length can vary as well.
Secondly, do I need to encode my training data? Is this all done automatically?
Not sure I understand what you are asking, but here is a good example written in Keras:
https://keras.io/examples/image_ocr/
I know tensorflow also has a function called tf.keras.backend.ctc_decode. Is this only used when making predictions?
In general, yes. You can also try to use it make you breakfast in the morning, but it's not very good at it ;)

maxpooling results not displaying in model.summary() output

I am beginner in Keras. I am tring to build a model for which i am using Sequential model. When i am tring to reduce the input size from 28 to 14 or lesser by using maxpooling function then the maxpooling function results does't display on call to the model.summary() function. I am tring to achive an accuracy of 0.99 or above after traing i.e, on call to model.score() the accuracy result should be 0.99 or above. Model build my me so far can be seen here
from keras.layers import Activation, MaxPooling2D
model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1)))
model.add(Convolution2D(32, 1, activation='relu'))
MaxPooling2D(pool_size=(2, 2))
model.add(Convolution2D(32, 26))
model.add(Convolution2D(10, 1))
model.add(Flatten())
model.add(Activation('softmax'))
model.summary()
Output -
Layer (type) Output Shape Param #
=================================================================
conv2d_29 (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_30 (Conv2D) (None, 26, 26, 32) 1056
_________________________________________________________________
conv2d_31 (Conv2D) (None, 1, 1, 32) 692256
_________________________________________________________________
conv2d_32 (Conv2D) (None, 1, 1, 10) 330
_________________________________________________________________
flatten_7 (Flatten) (None, 10) 0
_________________________________________________________________
activation_7 (Activation) (None, 10) 0
=================================================================
Total params: 693,962
Trainable params: 693,962
Non-trainable params: 0
____________________________
Batch size i am using is 32 and number of epoch is 10.
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)
score = model.evaluate(X_test, Y_test, verbose=0)
print(score)
Output after training -
[0.09016687796734459, 0.9814]
You are not adding the Maxpooling2D layer to your model...
model.add(MaxPooling2D(pool_size=(2, 2)))
Also, the output of your maxpooling will have shape (None, 13, 13, 32), the convolutional kernel in the next layer (in your case 26) can't be larger than the dimensions your current (13). Your code should be something like this:
from keras.layers import Activation, MaxPooling2D, Dense
model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1)))
model.add(Convolution2D(32, 1, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 8))
model.add(Convolution2D(10, 6))
model.add(Flatten())
model.add(Activation('softmax'))
print(model.summary())
Output
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_2 (Conv2D) (None, 26, 26, 32) 1056
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 6, 6, 32) 65568
_________________________________________________________________
conv2d_4 (Conv2D) (None, 1, 1, 10) 11530
_________________________________________________________________
flatten_1 (Flatten) (None, 10) 0
_________________________________________________________________
activation_1 (Activation) (None, 10) 0
=================================================================
Total params: 78,474
Trainable params: 78,474
Non-trainable params: 0
___________________________________
P.S.: I would consider using smaller kernel sizes and a FC layer at the output, as it is a more practical solution in most cases than trying to match convolution output shapes

Sci-kit Learn Confusion Matrix: Found input variables with inconsistent numbers of samples

I'm trying to plot a confusion matrix between the predicted test labels and the actual ones, but I'm getting this error
ValueError: Found input variables with inconsistent numbers of samples: [1263, 12630]
Dataset: GTSRB
Code used
Image augmentation
train_datagen = ImageDataGenerator(rescale=1./255,
rotation_range=20,
horizontal_flip=True,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.01,
zoom_range=[0.9, 1.25],
brightness_range=[0.5, 1.5])
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator and test_generator
batch_size = 10
train_generator = train_datagen.flow_from_directory(
directory=train_path,
target_size=(224, 224),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=True,
seed=42
)
test_generator = test_datagen.flow_from_directory(
directory=test_path,
target_size=(224, 224),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=False,
seed=42
)
Output of that code
Found 39209 images belonging to 43 classes.
Found 12630 images belonging to 43 classes.
Then, I used a VGG-16 model and replaced the latest Dense layer with a Dense(43, activation='softmax')
Model summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
predictions (Dense) (None, 1000) 4097000
_________________________________________________________________
dense_1 (Dense) (None, 43) 43043
=================================================================
Total params: 138,400,587
Trainable params: 43,043
Non-trainable params: 138,357,544
_________________________________________________________________
Compile the model
my_sgd = SGD(lr=0.01)
model.compile(
optimizer=my_sgd,
loss='categorical_crossentropy',
metrics=['accuracy']
)
Train the model
STEP_SIZE_TRAIN=train_generator.n//train_generator.batch_size
epochs=10
model.fit_generator(generator=train_generator,
steps_per_epoch=STEP_SIZE_TRAIN,
epochs=epochs,
verbose=1
)
Predictions
STEP_SIZE_TEST=test_generator.n//test_generator.batch_size
test_generator.reset()
predictions = model.predict_generator(test_generator, steps=STEP_SIZE_TEST, verbose=1)
Output
1263/1263 [==============================] - 229s 181ms/step
Predictions shape
print(predictions.shape)
(12630, 43)
Getting the test_data and test_labels
test_data = []
test_labels = []
batch_index = 0
while batch_index <= test_generator.batch_index:
data = next(test_generator)
test_data.append(data[0])
test_labels.append(data[1])
batch_index = batch_index + 1
test_data_array = np.asarray(test_data)
test_labels_array = np.asarray(test_labels)
Shape of test_data_array and test_labels_array
test_data_array.shape
(1263, 10, 224, 224, 3)
test_labels_array.shape
(1263, 10, 43)
Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(test_labels_array, predictions)
I get the output
ValueError: Found input variables with inconsistent numbers of samples: [1263, 12630]
I understand that this error is because the test_labels_array size isn't equal to the predictions; 1263 and 12630 respectively, but I don't really know what I'm doing wrong.
Any help would be much appreciated.
PS: If anyone has any tips on how to increase the training accuracy while we're at it, that would be brilliant.
Thanks!
You should reshape test_data_array and test_labels_array as follows:
data_count, batch_count, w, h, c = test_data_array.shape
test_data_array=np.reshape(test_data_array, (data_count*batch_count, w, h, c))
test_labels_array = np.reshape(test_labels_array , (data_count*batch_count, -1))
the way you are appending the results of test_generator is the reason. In fact the first call of your test_generator will generate 10 data with shape of (224, 224, 3). For the next call again your test_generator will generate 10 data with shape of (224, 224, 3). So now you should have 20 data of shape (224, 224, 3) while the way you are appending the results would cause that you came up with 2 data of shape (10, 224, 224, 3). which is not what you are expecting.

Keras using too much memory

I have a keras (with tensorflow backend) model which is defined like so:
INPUT_SHAPE = [4740, 3540, 1]
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=INPUT_SHAPE))
model.add(Conv2D(2, (4, 4), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(4, (4, 4), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(8, (4, 4), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(16, (4, 4), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Conv2D(32, (4, 4), strides=(1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(4, 4)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
This model has only 37,506 trainable params. Yet somehow it is able to deplete K80's 12GB vram resource on model.fit() if a batch size is more then 1.
Why does this model need so much memory?
And how do I calculate memory requirements properly?
The function from How to determine needed memory of Keras model? gives me 2.15 GB per 1 element in a batch. So at least I should be able to make a batch of 5.
EDIT: model.summary()
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 4738, 3538, 32) 320
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4735, 3535, 2) 1026
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 1183, 883, 2) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 1180, 880, 4) 132
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 295, 220, 4) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 292, 217, 8) 520
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 73, 54, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 70, 51, 16) 2064
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 17, 12, 16) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 14, 9, 32) 8224
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 3, 2, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 3, 2, 32) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 192) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 24704
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 4) 516
=================================================================
Total params: 37,506
Trainable params: 37,506
Non-trainable params: 0
_________________________________________________________________
The output shape of the first layer is B*4738*3538*32 (B is the batch size), which will take around 1GB * B memory. The gradients and other activations will probably take some memory too. Maybe increasing the stride for the first layer will help.