How to connect Conv3D output to MaxPooling2D in Keras? - tensorflow

I'm using Keras to implement CNN. People often use Conv2D to do classification tasks. However, I want to get relationships between two images, then I decide to try Conv3D. However, I couldn't manage the dimension output from Conv3D and match the following layers.
More specifically, I want to apply (5,5,2) filter on two stacked images which are (480, 640, 2), and output(480, 640, 1) tensor.
Original Conv2D code: (work fine)
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',input_shape=(480, 640, 2)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
...
Conv3D code: (Don't know how to concatenate Conv3D and MaxPooling2D)
model.add(Conv3D(32, 2, input_shape=(480, 640, 2, 1), data_format="channels_last"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
...

Stack both the images (remember to stack acc. to the backend you are using, theano is channels_first and tensorflow is channels_last) and pass 2 as the number of channels in Conv2D.
Or if you have many channels for each images, then again stack them up and pass the total number of channels to Conv2D.

Related

Passing output of 3DCNN layer to LSTM layer

Whilst trying to learn Recurrent Neural Networks(RNNs) am trying to train an Automatic Lip Reading Model using 3DCNN + LSTM. I tried out a code I found for the same on Kaggle.
model = Sequential()
# 1st layer group
model.add(Conv3D(32, (3, 3, 3), strides = 1, input_shape=(22, 100, 100, 1), activation='relu', padding='valid'))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
model.add(Conv3D(64, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
model.add(Conv3D(128, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
shape = model.get_output_shape_at(0)
model.add(Reshape((shape[-1],shape[1]*shape[2]*shape[3])))
# LSTMS - Recurrent Network Layer
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(.5))
model.add((Flatten()))
# # FC layers group
model.add(Dense(2048, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.summary()
However, it returns the following error:
11 model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
12
---> 13 shape = model.get_output_shape_at(0)
14 model.add(Reshape((shape[-1],shape[1]*shape[2]*shape[3])))
15
RuntimeError: The layer sequential_2 has never been called and thus has no defined output shape.
From my understanding, I see that the author of the code was trying to get the output shape of the first layer and reshape it such as to forward to the LSTM layer.
Found a similar post following which I made the following changes and the error was fixed.
shape = model.layers[-1].output_shape
# shape = model.get_output_shape_at(0)
Still I am confused as to what the code does to forward the input from the CNN layer to LSTM layer. Any help to make me understand the above is appreciated. Thank You!!
When you are passing the code from top to bottom then the inputs are flowing in the graph from top to bottom, you are getting this error because you can't call this function on eager mode, as Tensorflow 2.0 is fully transferred to eager mode, so, once you will fit the function and train it 1 epoch then you can use model.get_output_at(0) otherwise use mode.layers[-1].output.
The CNN Layer will extract the features locally then LSTM will sequentially extract and learn the feature, using CONV with LSTM is a good approach, but I will recommend you directly using tf.keras.layers.ConvLSTM3D. Check it here https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM3D
tf.keras.backend.clear_session()
model = Sequential()
# 1st layer group
model.add(Conv3D(32, (3, 3, 3), strides = 1, input_shape=(22, 100, 100, 1), activation='relu', padding='valid'))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
model.add(Conv3D(64, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
model.add(Conv3D(128, (3, 3, 3), activation='relu', strides=1))
model.add(MaxPooling3D(pool_size=(2, 2, 2), strides=2))
shape = model.layers[-1].output_shape
model.add(Reshape((shape[-1],shape[1]*shape[2]*shape[3])))
# LSTMS - Recurrent Network Layer
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(.5))
model.add((Flatten()))
# # FC layers group
model.add(Dense(2048, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.summary()

ValueError: Shapes (None, 1) and (None, 5) are incompatible in keras

model = Sequential()
model.add(Conv2D(128, (3, 3), activation='relu', input_shape=(64, 64, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation = 'softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=[tf.keras.metrics.Recall()])
This code works fine for metrics=['accuracy']), but it shows ValueError: Shapes (None, 1) and (None, 5) are incompatible for metrics=[tf.keras.metrics.Recall()])
Please help me. Thanks in advance.
Recall makes sense only for binary classification. Your final layer has 5 nodes, which essentially means you have 5 classes. You should change recall to another metric. Documentation should help you choose an appropriate metric for your model. Categorical accuracy should be good enough to get started.

how to improve the performance of the CNN model (Machine Learning - Deep Learning)

I am trying to train a model based on CNN,
Training data is the image of lunar lander, but the performance of model is not good, the accuracy is about 45%, I try to add more layer to improve it, but it still doesn't work well, could any one provide some ideas about how to improve it.
Label: up down left right (0,1,2,3)
Sample rate: 0.1
The Training data has converted from image to data
Label:
here is some of my code:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape=(CHANNELS, ROWS, COLS), activation='relu'))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(32, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation = 'softmax'))
.summary()
I would experiment a lower dropout rate. Dropouts are used to prevent overfitting. It looks like your model isn't even fitting in the first place.

Conv neural network to tell standard 52-card deck apart

I'm using the below keras model to train a neural network to tell 52 game cards 23456789TJQA each with Club, Diamond, Heart and Spade apart.
The model is working quite well but occasionally has problems telling Club and Diamond apart, as they are the most similar (and the difference is quite granular). I was wondering if anybody has some suggestions in what way I can improve the below model?
I've tried different things, like converting everything to black and white, grayscale, smoothing, augmentation etc, but nothing seems to solve that problem.
The pictures are all 15x50 pixels, with 1 channel, so the input shape is (15,50,1)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape, activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(1024, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

Feed CNN features to LSTM

I want to build an end-to-end trainable model with the following proprieties:
CNN to extract features from image
The features is reshaped to a matrix
Each row of this matrix is then fed to LSTM1
Each column of this matrix is then fed to LSTM2
The output of LSTM1 and LSTM2 are concatenated for the final output
(it's more or less similar to Figure 2 in this paper: https://arxiv.org/pdf/1611.07890.pdf)
My problem now is after the reshape, how can I feed the values of feature matrix to LSTM with Keras or Tensorflow?
This is my code so far with VGG16 net (also a link to Keras issues):
# VGG16
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 2
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 3
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 4
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 5
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 6
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
# reshape the feature 4096 = 64 * 64
model.add(Reshape((64, 64)))
# How to feed each row of this to LSTM?
# This is my first solution but it doesn’t look correct:
# model.add(LSTM(256, input_shape=(64, 1))) # 256 hidden units, sequence length = 64, feature dim = 1
Consider building your CNN model with Conv2D and MaxPool2D layers, until you reach your Flatten layer, because the vectorized output from the Flatten layer will be you input data to the LSTM part of your structure.
So, build your CNN model like this:
model_cnn = Sequential()
model_cnn.add(Conv2D...)
model_cnn.add(MaxPooling2D...)
...
model_cnn.add(Flatten())
Now, this is an interesting point, the current version of Keras has some incompatibility with some TensorFlow structures that will not let you stack your entire layers in just one Sequential object.
So it's time to use the Keras Model Object to complete you neural network with a trick:
input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
lstm_lay = LSTM(?)(time_distribute)
output_lay = Dense(?, activation='?')(lstm_lay)
And finally, now it's time to put together our 2 separated models:
model = Model(inputs=[input_lay], outputs=[output_lay])
model.compile(...)
OBS: Note that you can substitute my model_cnn example by your VGG without including the top layers, once the vectorized output from the VGG Flatten layer will be the input of the LSTM model.