Different output shape of Conv2D between tf.keras and keras?

It might be a dumb question since I'm new to Keras and Tensorflow.
I have this simple model:
classifier.add(Convolution2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=2, activation='softmax'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
When running with tf.keras.* (like from tensorflow.keras.models import Sequential) classes, summary shows the first layer as:
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 21, 21, 32) 896
but when running with keras.*(like from keras.models import Sequential) classes. summary shows the first layer as:
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 62, 62, 32) 896
Why do they give different output shapes?
I'm using tensorflow 2.0.0 and keras 2.3.1

Well that is related with the kernel_size actually. First let's check
Example 1:
classifier.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
note that the little change in Conv2D that is about in your code it was classifier.add(Conv2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu')) and in the other one has kernel_size = (3, 3).
Example 1 gives the output shape as
(None, 62, 62, 32)
Example 2:
Let's change it to your version
classifier.add(Conv2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
results in,
(None, 21, 21, 32) 896
The conclusion is, tf.keras api and standalone keras are interpreting kernel_size differently. Notice that I have get the both results with tf.keras api.


Tensorflow to Pytorch CNN(Use nn.Conv1d)

input_size = [765, 500, 72]
model = Sequential()
add = model.add
add(l.Conv1D(256, kernel_size=3, strides=2, activation='relu')
add(l.Conv1D(256, kernel_size=3, strides=2, activation='relu')
add(l.Dense(100, activation="relu"))
add(l.Dense(3, activation="softmax"))
(None, 249, 256)
(None, 249, 256)
(None, 124, 256)
(None, 124, 256)
(None, 256)
(None, 100)
(None, 3)
This is tensorflow model struc and summary.
Tensorflow to Pytorch CNN model.
Use Conv1D
[Tensorflow Model summary]
To jump-start your research, here is an example usage of nn.Conv1d:
>>> f = nn.Conv1d(72, 256, kernel_size=3, stride=2)
>>> f(torch.rand(765, 72, 500)).shape
torch.Size([765, 256, 249])
Regarding this case keep in mind a few PyTorch-related things :
Unlike Tensorflow, it handles data in the BHC format.
You have to provide the input feature sizes for each linear layer.
The activation function is not included in nn.Conv1d, you have to use a dedicated module for that (eg. nn.ReLU).

What is the correct way to upsample a [32x32x6] layer in a CNN

I have a CNN that produces a [32x32] image with 6 channels, but I need to upsample it to 256x256. I'm doing:
def upsample(filters, size):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
return result
Then I pass the layer like this:
up_stack = [
upsample(6, 3), # x2
upsample(6, 3), # x2
upsample(6, 3) # x2
for up in up_stack:
finalLayer = up(finalLayer)
But this setup produces inaccurate results. Is there anything I'm doing wrong?
Your other option would be to use tf.keras.layers.UpSampling2D for your purpose, but that doesn't learn a kernel to upsample (it uses bilinear upsampling).
So, your approach is correct. But, you have used kernel_size as 3x3.
It should be 2x2 and if you are not satisfied with the results, you should increase the number of filters from [32, 256].
If you wish to use the up-convolution, I will suggest doing the following to achieve what you want. Following code works, just change the filter based on your need.
import tensorflow as tf
from tensorflow.keras import layers
# in = 32x32 out 256x256
inputs = layers.Input(shape=(32, 32, 6))
deconc01 = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(inputs)
deconc02 = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(deconc01)
outputs = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(deconc02)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="up-conv")
Model: "up-conv"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 32, 32, 6)] 0
conv2d_transpose (Conv2DTran (None, 64, 64, 256) 6400
conv2d_transpose_1 (Conv2DTr (None, 128, 128, 256) 262400
conv2d_transpose_2 (Conv2DTr (None, 256, 256, 256) 262400
Total params: 531,200
Trainable params: 531,200
Non-trainable params: 0

How to make input layer explicit in tf.keras

This question makes use of a pre-trained VGG network, whose summary shows an InputLayer being used. I like the clarity of the explicit input layer... even if functionally it does nothing (true?). But when I try to mimic this with something like:
model = Sequential()
model.add(Input(shape=(128, 128, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
the result displayed using print(model.summary()) is no different from:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3))
... and both show the first layer as being a Conv2D layer. Where did my Input layer go? And is it worth the hassle of getting it back?
In your example you're using a Sequential, try using a keras.models.Model.
inp = keras.layers.Input((128, 128, 3))
op = keras.layers.Conv2D(32, (3, 3), activation='relu')(inp)
model = keras.models.Model(inputs=[ inp ], outputs = [op] )
Model: "model_1"
Layer (type) Output Shape Param #
input_2 (InputLayer) [(None, 128, 128, 3)] 0
conv2d_4 (Conv2D) (None, 126, 126, 32) 896
Total params: 896
Trainable params: 896
Non-trainable params: 0
No, you can keep them separate, it does not make any difference.
As for the input_shape, that argument can be specified for each and every layer, yet Keras is smart enough to deduce on its own the shape of the next layers so we do not mention it explicitly.

Error: expected conv3d_1_input to have 5 dimensions, but got array with shape (10, 224, 224, 3)

I'm trying to train a Neural Network on a dataset for liveness anti-spoofing. I have some videos in two folders named genuine and fake. I have extracted 10 frames of each video and saved them in two folders with aforementioned names under a new directory tarining.
----/genuine/ #containes 10frame*300videos=3000images
----/fake/ #containes 10frame*800videos=8000images
I designed the following 3D Convent using Keras as my first try, but after running it, it throws the following exception:
from keras.preprocessing.image import ImageDataGenerator
from keras import Model, optimizers, activations, losses, regularizers, backend, Sequential
from keras.layers import Dense, MaxPooling3D, AveragePooling3D, Conv3D, Input, Flatten, BatchNormalization
TARGET_SIZE = (224, 224)
train_datagen = ImageDataGenerator(rescale=1.0/255,
train_generator = train_datagen.flow_from_directory("./training/",
validation_generator = train_datagen.flow_from_directory("./training/",
SHAPE = (10, 224, 224, 3)
model = Sequential()
model.add(Conv3D(filters=128, kernel_size=(1, 3, 3), data_format='channels_last', activation='relu', input_shape=(10, 224, 224, 3)))
model.add(MaxPooling3D(data_format='channels_last', pool_size=(1, 2, 2)))
model.add(Conv3D(filters=64, kernel_size=(2, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(1, 2, 2)))
model.add(Conv3D(filters=32, kernel_size=(2, 3, 3), activation='relu'))
model.add(Conv3D(filters=32, kernel_size=(2, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(1, 2, 2)))
model.add(Conv3D(filters=16, kernel_size=(2, 3, 3), activation='relu'))
model.add(Conv3D(filters=16, kernel_size=(2, 3, 3), activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer=optimizers.adam(), loss=losses.binary_crossentropy, metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=5, validation_data=validation_generator, validation_steps=validation_generator.samples/validation_generator.batch_size)
Here is the Error:
ValueError: Error when checking input: expected conv3d_1_input to have 5 dimensions, but got array with shape (10, 224, 224, 3)
And this is the output of model.summary()
Model: "sequential_1"
Layer (type) Output Shape Param #
conv3d_1 (Conv3D) (None, 10, 222, 222, 128) 3584
max_pooling3d_1 (MaxPooling3 (None, 10, 111, 111, 128) 0
conv3d_2 (Conv3D) (None, 9, 109, 109, 64) 147520
max_pooling3d_2 (MaxPooling3 (None, 9, 54, 54, 64) 0
conv3d_3 (Conv3D) (None, 8, 52, 52, 32) 36896
conv3d_4 (Conv3D) (None, 7, 50, 50, 32) 18464
max_pooling3d_3 (MaxPooling3 (None, 7, 25, 25, 32) 0
conv3d_5 (Conv3D) (None, 6, 23, 23, 16) 9232
conv3d_6 (Conv3D) (None, 5, 21, 21, 16) 4624
average_pooling3d_1 (Average (None, 2, 10, 10, 16) 0
batch_normalization_1 (Batch (None, 2, 10, 10, 16) 64
dense_1 (Dense) (None, 2, 10, 10, 32) 544
dense_2 (Dense) (None, 2, 10, 10, 1) 33
Total params: 220,961
Trainable params: 220,929
Non-trainable params: 32
I'd appreciate any help to fix the exception. By the way, I'm using TensorFlow as backend if it helps to solve the problem.
As #thushv89 mentioned in the comments Keras has no build-in video generator which causes a lot of problems for those who will work with big video datasets. Therefore, I wrote a simple VideoDataGenerator which works almost as simple as ImageDataGenerator. The script could be found here on my github in case someone needs it in the future.

CNN implementation using Keras and Tensorflow

I have created a CNN model using Keras and I am training it on a MNIST dataset. I got a reasonable accuracy around 98%, which is what I expected:
model = Sequential()
model.add(Conv2D(64, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(Conv2D(64, 5, activation="relu"))
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(data.x_train, data.y_train,
batch_size=256, validation_data=(data.x_test, data.y_test))
Now I want to build the same model, but using vanilla Tensorflow, here is how I did that:
X = tf.placeholder(shape=[None, 784], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")
net = tf.reshape(X, [-1, 28, 28, 1])
net = tf.layers.conv2d(
net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dense(net, name="dense1", units=256, activation=tf.nn.relu)
model = tf.layers.dense(net, name="output", units=10)
And here is how I train/test it:
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32)
with tf.Session() as sess:
for batch in range(data.get_number_of_train_batches(batch_size)):
x, y = data.get_next_train_batch(batch_size)
sess.run([loss, opt], feed_dict={X: x, Y: y})
for batch in range(data.get_number_of_test_batches(batch_size)):
x, y = data.get_next_test_batch(batch_size)
sess.run(accuracy, feed_dict={X: x, Y: y})
But the resulting accuracy of the model dropped to ~80%. What are the principal differences between my implementation of that model using Keras and Tensorflow ? Why the accuracy varies so much ?
I don't see any mistakes in your code. Note that your current model is heavily parameterized for such a simple problem because of the Dense layers, which introduce over 260k trainable parameters:
Layer (type) Output Shape Param #
conv2d_3 (Conv2D) (None, 24, 24, 64) 1664
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 64) 0
conv2d_4 (Conv2D) (None, 8, 8, 64) 102464
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64) 0
flatten_2 (Flatten) (None, 1024) 0
dense_2 (Dense) (None, 256) 262400
dense_3 (Dense) (None, 10) 2570
Total params: 369,098
Trainable params: 369,098
Non-trainable params: 0
Below, I will run your code with:
minor adaptations to make the code work with the MNIST dataset in keras.datasets
a simplified model: basically I remove the 256-node Dense layer, drastically reducing the number of trainable parameters, and introduce some dropout for regularization.
With these changes, both models achieve 90%+ validation set accuracy after the first epoch. So it seems the problem you encountered has to do with an ill-posed optimization problem which leads to highly variable outcomes, and not with a bug in your code.
# Import the datasets
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Add batch dimension
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=None)
y_test = to_categorical(y_test, num_classes=None)
batch_size = 64
# Fit model using Keras
import keras
import numpy as np
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(32, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(Conv2D(32, 5, activation="relu"))
model.add(Dense(10, activation='softmax'))
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=32, validation_data=(x_test, y_test), epochs=1)
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 35s 583us/step - loss: 1.5217 - acc: 0.8736 - val_loss: 0.0850 - val_acc: 0.9742
Note that the number of trainable parameters is now just a fraction of the amount in your model:
Layer (type) Output Shape Param #
conv2d_3 (Conv2D) (None, 24, 24, 32) 832
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 32) 0
conv2d_4 (Conv2D) (None, 8, 8, 32) 25632
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 32) 0
flatten_2 (Flatten) (None, 512) 0
dropout_1 (Dropout) (None, 512) 0
dense_2 (Dense) (None, 10) 5130
Total params: 31,594
Trainable params: 31,594
Non-trainable params: 0
Now, doing the same with TensorFlow:
# Fit model using TensorFlow
import tensorflow as tf
X = tf.placeholder(shape=[None, 28, 28, 1], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")
net = tf.layers.conv2d(
X, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
net, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dropout(net, rate=0.25)
model = tf.layers.dense(net, name="output", units=10)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32))
with tf.Session() as sess:
L = []
l_ = 0
for i in range(x_train.shape[0] // batch_size):
x, y = x_train[i*batch_size:(i+1)*batch_size],\
l, _ = sess.run([loss, opt], feed_dict={X: x, Y: y})
l_ += np.mean(l)
L.append(l_ / (x_train.shape[0] // batch_size))
print('Training loss: {:.3f}'.format(L[-1]))
acc = []
for j in range(x_test.shape[0] // batch_size):
x, y = x_test[j*batch_size:(j+1)*batch_size],\
acc.append(sess.run(accuracy, feed_dict={X: x, Y: y}))
print('Test set accuracy: {:.3f}'.format(np.mean(acc)))
Training loss: 0.519
Test set accuracy: 0.968
Possible improvement of your models.
I used CNN networks on different problems and always got good effectiveness improvements with regularization techniques, the best ones with dropout.
I suggest to use Dropout on the Dense layers and in case with lower probability on the convolutional ones.
Also data augmentation on the input data is very important, but applicability depends on the problem domain.
P.s: in one case I had to change the optimization from Adam to SGD with Momentum. So, playing with the optimization makes sense. Also Gradient clipping can be considered when your networks starves and doesn't improve effectiveness, may be a numeric issue.