Different output shape of Conv2D between tf.keras and keras? - tensorflow

It might be a dumb question since I'm new to Keras and Tensorflow.
I have this simple model:
classifier=Sequential()
classifier.add(Convolution2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Flatten())
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=2, activation='softmax'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.summary()
When running with tf.keras.* (like from tensorflow.keras.models import Sequential) classes, summary shows the first layer as:
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 21, 21, 32) 896
but when running with keras.*(like from keras.models import Sequential) classes. summary shows the first layer as:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 62, 62, 32) 896
Why do they give different output shapes?
I'm using tensorflow 2.0.0 and keras 2.3.1

Well that is related with the kernel_size actually. First let's check
Example 1:
classifier=Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.summary()
note that the little change in Conv2D that is about in your code it was classifier.add(Conv2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu')) and in the other one has kernel_size = (3, 3).
Example 1 gives the output shape as
(None, 62, 62, 32)
Example 2:
Let's change it to your version
classifier=Sequential()
classifier.add(Conv2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.summary()
results in,
(None, 21, 21, 32) 896
The conclusion is, tf.keras api and standalone keras are interpreting kernel_size differently. Notice that I have get the both results with tf.keras api.

Related

Tensorflow to Pytorch CNN(Use nn.Conv1d)

input_size = [765, 500, 72]
model = Sequential()
add = model.add
add(l.Conv1D(256, kernel_size=3, strides=2, activation='relu')
add(l.Dropout(0.5))
add(l.Conv1D(256, kernel_size=3, strides=2, activation='relu')
add(l.Dropout(0.5))
add(l.GlobalAveragePooling1D())
add(l.Dense(100, activation="relu"))
add(l.Dense(3, activation="softmax"))
(None, 249, 256)
(None, 249, 256)
(None, 124, 256)
(None, 124, 256)
(None, 256)
(None, 100)
(None, 3)
This is tensorflow model struc and summary.
Tensorflow to Pytorch CNN model.
Use Conv1D
[Tensorflow Model summary]
To jump-start your research, here is an example usage of nn.Conv1d:
>>> f = nn.Conv1d(72, 256, kernel_size=3, stride=2)
>>> f(torch.rand(765, 72, 500)).shape
torch.Size([765, 256, 249])
Regarding this case keep in mind a few PyTorch-related things :
Unlike Tensorflow, it handles data in the BHC format.
You have to provide the input feature sizes for each linear layer.
The activation function is not included in nn.Conv1d, you have to use a dedicated module for that (eg. nn.ReLU).

What is the correct way to upsample a [32x32x6] layer in a CNN

I have a CNN that produces a [32x32] image with 6 channels, but I need to upsample it to 256x256. I'm doing:
def upsample(filters, size):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
return result
Then I pass the layer like this:
up_stack = [
upsample(6, 3), # x2
upsample(6, 3), # x2
upsample(6, 3) # x2
]
for up in up_stack:
finalLayer = up(finalLayer)
But this setup produces inaccurate results. Is there anything I'm doing wrong?
Your other option would be to use tf.keras.layers.UpSampling2D for your purpose, but that doesn't learn a kernel to upsample (it uses bilinear upsampling).
So, your approach is correct. But, you have used kernel_size as 3x3.
It should be 2x2 and if you are not satisfied with the results, you should increase the number of filters from [32, 256].
If you wish to use the up-convolution, I will suggest doing the following to achieve what you want. Following code works, just change the filter based on your need.
import tensorflow as tf
from tensorflow.keras import layers
# in = 32x32 out 256x256
inputs = layers.Input(shape=(32, 32, 6))
deconc01 = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(inputs)
deconc02 = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(deconc01)
outputs = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(deconc02)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="up-conv")
Model: "up-conv"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 32, 32, 6)] 0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 64, 64, 256) 6400
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 128, 128, 256) 262400
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 256, 256, 256) 262400
=================================================================
Total params: 531,200
Trainable params: 531,200
Non-trainable params: 0
_________________________________________________________________

How to make input layer explicit in tf.keras

This question makes use of a pre-trained VGG network, whose summary shows an InputLayer being used. I like the clarity of the explicit input layer... even if functionally it does nothing (true?). But when I try to mimic this with something like:
model = Sequential()
model.add(Input(shape=(128, 128, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
the result displayed using print(model.summary()) is no different from:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3))
... and both show the first layer as being a Conv2D layer. Where did my Input layer go? And is it worth the hassle of getting it back?
In your example you're using a Sequential, try using a keras.models.Model.
inp = keras.layers.Input((128, 128, 3))
op = keras.layers.Conv2D(32, (3, 3), activation='relu')(inp)
model = keras.models.Model(inputs=[ inp ], outputs = [op] )
model.summary()
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 128, 128, 3)] 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 126, 126, 32) 896
=================================================================
Total params: 896
Trainable params: 896
Non-trainable params: 0
_________________________________________________________________
No, you can keep them separate, it does not make any difference.
As for the input_shape, that argument can be specified for each and every layer, yet Keras is smart enough to deduce on its own the shape of the next layers so we do not mention it explicitly.

Error: expected conv3d_1_input to have 5 dimensions, but got array with shape (10, 224, 224, 3)

I'm trying to train a Neural Network on a dataset for liveness anti-spoofing. I have some videos in two folders named genuine and fake. I have extracted 10 frames of each video and saved them in two folders with aforementioned names under a new directory tarining.
--/training/
----/genuine/ #containes 10frame*300videos=3000images
----/fake/ #containes 10frame*800videos=8000images
I designed the following 3D Convent using Keras as my first try, but after running it, it throws the following exception:
from keras.preprocessing.image import ImageDataGenerator
from keras import Model, optimizers, activations, losses, regularizers, backend, Sequential
from keras.layers import Dense, MaxPooling3D, AveragePooling3D, Conv3D, Input, Flatten, BatchNormalization
BATCH_SIZE = 10
TARGET_SIZE = (224, 224)
train_datagen = ImageDataGenerator(rescale=1.0/255,
data_format='channels_last',
validation_split=0.2,
shear_range=0.0,
zoom_range=0,
horizontal_flip=False,
featurewise_center=False,
featurewise_std_normalization=False,
width_shift_range=False,
height_shift_range=False)
train_generator = train_datagen.flow_from_directory("./training/",
target_size=TARGET_SIZE,
batch_size=BATCH_SIZE,
class_mode='binary',
shuffle=False,
subset='training')
validation_generator = train_datagen.flow_from_directory("./training/",
target_size=TARGET_SIZE,
batch_size=BATCH_SIZE,
class_mode='binary',
shuffle=False,
subset='validation')
SHAPE = (10, 224, 224, 3)
model = Sequential()
model.add(Conv3D(filters=128, kernel_size=(1, 3, 3), data_format='channels_last', activation='relu', input_shape=(10, 224, 224, 3)))
model.add(MaxPooling3D(data_format='channels_last', pool_size=(1, 2, 2)))
model.add(Conv3D(filters=64, kernel_size=(2, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(1, 2, 2)))
model.add(Conv3D(filters=32, kernel_size=(2, 3, 3), activation='relu'))
model.add(Conv3D(filters=32, kernel_size=(2, 3, 3), activation='relu'))
model.add(MaxPooling3D(pool_size=(1, 2, 2)))
model.add(Conv3D(filters=16, kernel_size=(2, 3, 3), activation='relu'))
model.add(Conv3D(filters=16, kernel_size=(2, 3, 3), activation='relu'))
model.add(AveragePooling3D())
model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer=optimizers.adam(), loss=losses.binary_crossentropy, metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size, epochs=5, validation_data=validation_generator, validation_steps=validation_generator.samples/validation_generator.batch_size)
model.save('3d.h5')
Here is the Error:
ValueError: Error when checking input: expected conv3d_1_input to have 5 dimensions, but got array with shape (10, 224, 224, 3)
And this is the output of model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv3d_1 (Conv3D) (None, 10, 222, 222, 128) 3584
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 10, 111, 111, 128) 0
_________________________________________________________________
conv3d_2 (Conv3D) (None, 9, 109, 109, 64) 147520
_________________________________________________________________
max_pooling3d_2 (MaxPooling3 (None, 9, 54, 54, 64) 0
_________________________________________________________________
conv3d_3 (Conv3D) (None, 8, 52, 52, 32) 36896
_________________________________________________________________
conv3d_4 (Conv3D) (None, 7, 50, 50, 32) 18464
_________________________________________________________________
max_pooling3d_3 (MaxPooling3 (None, 7, 25, 25, 32) 0
_________________________________________________________________
conv3d_5 (Conv3D) (None, 6, 23, 23, 16) 9232
_________________________________________________________________
conv3d_6 (Conv3D) (None, 5, 21, 21, 16) 4624
_________________________________________________________________
average_pooling3d_1 (Average (None, 2, 10, 10, 16) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 2, 10, 10, 16) 64
_________________________________________________________________
dense_1 (Dense) (None, 2, 10, 10, 32) 544
_________________________________________________________________
dense_2 (Dense) (None, 2, 10, 10, 1) 33
=================================================================
Total params: 220,961
Trainable params: 220,929
Non-trainable params: 32
__________________________________________________________
I'd appreciate any help to fix the exception. By the way, I'm using TensorFlow as backend if it helps to solve the problem.
As #thushv89 mentioned in the comments Keras has no build-in video generator which causes a lot of problems for those who will work with big video datasets. Therefore, I wrote a simple VideoDataGenerator which works almost as simple as ImageDataGenerator. The script could be found here on my github in case someone needs it in the future.

CNN implementation using Keras and Tensorflow

I have created a CNN model using Keras and I am training it on a MNIST dataset. I got a reasonable accuracy around 98%, which is what I expected:
model = Sequential()
model.add(Conv2D(64, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D())
model.add(Conv2D(64, 5, activation="relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(data.x_train, data.y_train,
batch_size=256, validation_data=(data.x_test, data.y_test))
Now I want to build the same model, but using vanilla Tensorflow, here is how I did that:
X = tf.placeholder(shape=[None, 784], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")
net = tf.reshape(X, [-1, 28, 28, 1])
net = tf.layers.conv2d(
net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dense(net, name="dense1", units=256, activation=tf.nn.relu)
model = tf.layers.dense(net, name="output", units=10)
And here is how I train/test it:
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32)
with tf.Session() as sess:
tf.global_variables_initializer().run()
for batch in range(data.get_number_of_train_batches(batch_size)):
x, y = data.get_next_train_batch(batch_size)
sess.run([loss, opt], feed_dict={X: x, Y: y})
for batch in range(data.get_number_of_test_batches(batch_size)):
x, y = data.get_next_test_batch(batch_size)
sess.run(accuracy, feed_dict={X: x, Y: y})
But the resulting accuracy of the model dropped to ~80%. What are the principal differences between my implementation of that model using Keras and Tensorflow ? Why the accuracy varies so much ?
I don't see any mistakes in your code. Note that your current model is heavily parameterized for such a simple problem because of the Dense layers, which introduce over 260k trainable parameters:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 24, 24, 64) 1664
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 64) 102464
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 262400
_________________________________________________________________
dense_3 (Dense) (None, 10) 2570
=================================================================
Total params: 369,098
Trainable params: 369,098
Non-trainable params: 0
_________________________________________________________________
Below, I will run your code with:
minor adaptations to make the code work with the MNIST dataset in keras.datasets
a simplified model: basically I remove the 256-node Dense layer, drastically reducing the number of trainable parameters, and introduce some dropout for regularization.
With these changes, both models achieve 90%+ validation set accuracy after the first epoch. So it seems the problem you encountered has to do with an ill-posed optimization problem which leads to highly variable outcomes, and not with a bug in your code.
# Import the datasets
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Add batch dimension
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)
# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=None)
y_test = to_categorical(y_test, num_classes=None)
batch_size = 64
# Fit model using Keras
import keras
import numpy as np
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(32, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D())
model.add(Conv2D(32, 5, activation="relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=32, validation_data=(x_test, y_test), epochs=1)
Result:
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 35s 583us/step - loss: 1.5217 - acc: 0.8736 - val_loss: 0.0850 - val_acc: 0.9742
Note that the number of trainable parameters is now just a fraction of the amount in your model:
model.summary()
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 24, 24, 32) 832
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 8, 8, 32) 25632
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 32) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 512) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 5130
=================================================================
Total params: 31,594
Trainable params: 31,594
Non-trainable params: 0
Now, doing the same with TensorFlow:
# Fit model using TensorFlow
import tensorflow as tf
X = tf.placeholder(shape=[None, 28, 28, 1], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")
net = tf.layers.conv2d(
X, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
net, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dropout(net, rate=0.25)
model = tf.layers.dense(net, name="output", units=10)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32))
with tf.Session() as sess:
tf.global_variables_initializer().run()
L = []
l_ = 0
for i in range(x_train.shape[0] // batch_size):
x, y = x_train[i*batch_size:(i+1)*batch_size],\
y_train[i*batch_size:(i+1)*batch_size]
l, _ = sess.run([loss, opt], feed_dict={X: x, Y: y})
l_ += np.mean(l)
L.append(l_ / (x_train.shape[0] // batch_size))
print('Training loss: {:.3f}'.format(L[-1]))
acc = []
for j in range(x_test.shape[0] // batch_size):
x, y = x_test[j*batch_size:(j+1)*batch_size],\
y_test[j*batch_size:(j+1)*batch_size]
acc.append(sess.run(accuracy, feed_dict={X: x, Y: y}))
print('Test set accuracy: {:.3f}'.format(np.mean(acc)))
Result:
Training loss: 0.519
Test set accuracy: 0.968
Possible improvement of your models.
I used CNN networks on different problems and always got good effectiveness improvements with regularization techniques, the best ones with dropout.
I suggest to use Dropout on the Dense layers and in case with lower probability on the convolutional ones.
Also data augmentation on the input data is very important, but applicability depends on the problem domain.
P.s: in one case I had to change the optimization from Adam to SGD with Momentum. So, playing with the optimization makes sense. Also Gradient clipping can be considered when your networks starves and doesn't improve effectiveness, may be a numeric issue.