input_shape with image_generator in Tensorflow - tensorflow2.0

I'm trying to use this approach in Tensorflow 2.X to load large dataset that does not fit in memory.
I have a folder with X sub-folders that contains images. Each sub-folder is a class.
\dataset
-\class1
-img1_1.jpg
-img1_2.jpg
-...
-\classe2
-img2_1.jpg
-img2_2.jpg
-...
I create my data generator from my folder like this:
train_data_gen = image_generator.flow_from_directory(directory="path\\to\\dataset",
batch_size=100,
shuffle=True,
target_size=(100, 100), # Image H x W
classes=list(CLASS_NAMES)) # list of folder/class names ["class1", "class2", ...., "classX"]
Found 629 images belonging to 2 classes.
I've did a smaller dataset to test the pipeline. Only 629 images in 2 classes.
Now I can create a dummy model like this:
model = tf.keras.Sequential()
model.add(Dense(1, activation=activation, input_shape=(100, 100, 3))) # only 1 layer of 1 neuron
model.add(Dense(2)) # 2classes
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['categorical_accuracy'])
Once compile I try to fit this dummy model:
STEPS_PER_EPOCH = np.ceil(image_count / batch_size) # 629 / 100
model.fit_generator(generator=train_data_gen , steps_per_epoch=STEPS_PER_EPOCH, epochs=2, verbose=1)
1/7 [===>..........................] - ETA: 2s - loss: 1.1921e-07 - categorical_accuracy: 0.9948
2/7 [=======>......................] - ETA: 1s - loss: 1.1921e-07 - categorical_accuracy: 0.5124
3/7 [===========>..................] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.3449
4/7 [================>.............] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.2662
5/7 [====================>.........] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.2130
6/7 [========================>.....] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.1808
2020-04-14 20:39:48.629203: W tensorflow/core/framework/op_kernel.cc:1610] Invalid argument: ValueError: generator yielded an element of shape (29, 100, 100, 3) where an element of shape (100, 100, 100, 3) was expected.
From what i understand, the last batch doesn't has the same shape has the previous batches. So it crashes. I've tried to specify a batch_input_shape.
model.add(Dense(1, activation=activation, batch_input_shape=(None, 100, 100, 3)))
I've found here that I should put None to not specify the number of elements in the batch so it can be dynamic. But no success.
Edit: From the comment I had 2 mistakes:
The output shape was bad. I missed the flatten layer in the model.
The previous link does work with the correction of the flatten layer
Missing some code, I actually feed the fit_generator with a tf.data.Dataset.from_generator but I gave here a image_generator.flow_from_directory.
Here is the final code:
train_data_gen = image_generator.flow_from_directory(directory="path\\to\\dataset",
batch_size=1000,
shuffle=True,
target_size=(100, 100),
classes=list(CLASS_NAMES))
train_dataset = tf.data.Dataset.from_generator(
lambda: train_data_gen,
output_types=(tf.float32, tf.float32),
output_shapes=([None, x, y, 3],
[None, len(CLASS_NAMES)]))
model = tf.keras.Sequential()
model.add(Flatten(batch_input_shape=(None, 100, 100, 3)))
model.add(Dense(1, activation=activation))
model.add(Dense(2))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['categorical_accuracy'])
STEPS_PER_EPOCH = np.ceil(image_count / batch_size) # 629 / 100
model.fit_generator(generator=train_data_gen , steps_per_epoch=STEPS_PER_EPOCH, epochs=2, verbose=1)

For the benefit of community here i am explaining, how to use image_generator in Tensorflow with input_shape (100, 100, 3) using dogs vs cats dataset
If we haven't choose right batch size there is a chance of model struck right after first epoch, hence i am starting my explanation with how to choose batch_size ?
We generally observe that batch size to be the power of 2, this is because of the effective work of optimized matrix operation libraries. This is further elaborated in this research paper.
Check out this blog which describes how to choose the right batch size while comparing the effects of different batch sizes on the accuracy of CIFAR-10 dataset.
Here is the end to end working code with outputs
import os
import numpy as np
from keras import layers
import pandas as pd
from tensorflow.keras.layers import Input, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D
from tensorflow.keras.layers import AveragePooling2D, MaxPooling2D, Dropout, GlobalMaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.models import Sequential
from tensorflow.keras import regularizers, optimizers
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import keras.backend as K
K.set_image_data_format('channels_last')
train_dir = '/content/drive/My Drive/Dogs_Vs_Cats/train'
test_dir = '/content/drive/My Drive/Dogs_Vs_Cats/test'
img_width, img_height = 100, 100
input_shape = img_width, img_height, 3
train_samples = 2000
test_samples = 1000
epochs = 30
batch_size = 32
train_datagen = ImageDataGenerator(
rescale = 1. /255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(
rescale = 1. /255)
train_data = train_datagen.flow_from_directory(
train_dir,
target_size = (img_width, img_height),
batch_size = batch_size,
class_mode = 'binary')
test_data = test_datagen.flow_from_directory(
test_dir,
target_size = (img_width, img_height),
batch_size = batch_size,
class_mode = 'binary')
model = Sequential()
model.add(Conv2D(32, (7, 7), strides = (1, 1), input_shape = input_shape))
model.add(BatchNormalization(axis = 3))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (7, 7), strides = (1, 1)))
model.add(BatchNormalization(axis = 3))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics = ['accuracy'])
model.fit_generator(
train_data,
steps_per_epoch = train_samples//batch_size,
epochs = epochs,
validation_data = test_data,
verbose = 1,
validation_steps = test_samples//batch_size)
Output:
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_8 (Conv2D) (None, 94, 94, 32) 4736
_________________________________________________________________
batch_normalization_8 (Batch (None, 94, 94, 32) 128
_________________________________________________________________
activation_8 (Activation) (None, 94, 94, 32) 0
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 47, 47, 32) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 41, 41, 64) 100416
_________________________________________________________________
batch_normalization_9 (Batch (None, 41, 41, 64) 256
_________________________________________________________________
activation_9 (Activation) (None, 41, 41, 64) 0
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 20, 20, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 25600) 0
_________________________________________________________________
dense_11 (Dense) (None, 64) 1638464
_________________________________________________________________
dropout_4 (Dropout) (None, 64) 0
_________________________________________________________________
dense_12 (Dense) (None, 1) 65
=================================================================
Total params: 1,744,065
Trainable params: 1,743,873
Non-trainable params: 192
_________________________________________________________________
Epoch 1/30
62/62 [==============================] - 14s 225ms/step - loss: 1.8307 - accuracy: 0.4853 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 2/30
62/62 [==============================] - 14s 226ms/step - loss: 0.7085 - accuracy: 0.4832 - val_loss: 0.6931 - val_accuracy: 0.5010
Epoch 3/30
62/62 [==============================] - 14s 218ms/step - loss: 0.6955 - accuracy: 0.5300 - val_loss: 0.6894 - val_accuracy: 0.5292
Epoch 4/30
62/62 [==============================] - 14s 221ms/step - loss: 0.6938 - accuracy: 0.5407 - val_loss: 0.7309 - val_accuracy: 0.5262
Epoch 5/30
62/62 [==============================] - 14s 218ms/step - loss: 0.6860 - accuracy: 0.5498 - val_loss: 0.6776 - val_accuracy: 0.5665
Epoch 6/30
62/62 [==============================] - 13s 216ms/step - loss: 0.7027 - accuracy: 0.5407 - val_loss: 0.6895 - val_accuracy: 0.5101
Epoch 7/30
62/62 [==============================] - 13s 216ms/step - loss: 0.6852 - accuracy: 0.5528 - val_loss: 0.6567 - val_accuracy: 0.5887
Epoch 8/30
62/62 [==============================] - 13s 217ms/step - loss: 0.6772 - accuracy: 0.5427 - val_loss: 0.6643 - val_accuracy: 0.5847
Epoch 9/30
62/62 [==============================] - 13s 217ms/step - loss: 0.6709 - accuracy: 0.5534 - val_loss: 0.6623 - val_accuracy: 0.5887
Epoch 10/30
62/62 [==============================] - 14s 219ms/step - loss: 0.6579 - accuracy: 0.5711 - val_loss: 0.6614 - val_accuracy: 0.6058
Epoch 11/30
62/62 [==============================] - 13s 218ms/step - loss: 0.6591 - accuracy: 0.5625 - val_loss: 0.6594 - val_accuracy: 0.5454
Epoch 12/30
62/62 [==============================] - 13s 216ms/step - loss: 0.6419 - accuracy: 0.5767 - val_loss: 1.1041 - val_accuracy: 0.5161
Epoch 13/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6479 - accuracy: 0.5783 - val_loss: 0.6441 - val_accuracy: 0.5837
Epoch 14/30
62/62 [==============================] - 13s 216ms/step - loss: 0.6373 - accuracy: 0.5899 - val_loss: 0.6427 - val_accuracy: 0.6310
Epoch 15/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6203 - accuracy: 0.6133 - val_loss: 0.7390 - val_accuracy: 0.6220
Epoch 16/30
62/62 [==============================] - 13s 217ms/step - loss: 0.6277 - accuracy: 0.6362 - val_loss: 0.6649 - val_accuracy: 0.5786
Epoch 17/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6155 - accuracy: 0.6316 - val_loss: 0.9823 - val_accuracy: 0.5484
Epoch 18/30
62/62 [==============================] - 14s 222ms/step - loss: 0.6056 - accuracy: 0.6408 - val_loss: 0.6333 - val_accuracy: 0.6048
Epoch 19/30
62/62 [==============================] - 14s 218ms/step - loss: 0.6025 - accuracy: 0.6529 - val_loss: 0.6514 - val_accuracy: 0.6442
Epoch 20/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6149 - accuracy: 0.6423 - val_loss: 0.6373 - val_accuracy: 0.6048
Epoch 21/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6030 - accuracy: 0.6519 - val_loss: 0.6086 - val_accuracy: 0.6573
Epoch 22/30
62/62 [==============================] - 13s 217ms/step - loss: 0.5936 - accuracy: 0.6865 - val_loss: 1.0677 - val_accuracy: 0.5605
Epoch 23/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5964 - accuracy: 0.6728 - val_loss: 0.7927 - val_accuracy: 0.5877
Epoch 24/30
62/62 [==============================] - 13s 215ms/step - loss: 0.5866 - accuracy: 0.6707 - val_loss: 0.6116 - val_accuracy: 0.6421
Epoch 25/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5933 - accuracy: 0.6662 - val_loss: 0.8282 - val_accuracy: 0.6048
Epoch 26/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5705 - accuracy: 0.6885 - val_loss: 0.5806 - val_accuracy: 0.6966
Epoch 27/30
62/62 [==============================] - 14s 218ms/step - loss: 0.5709 - accuracy: 0.7017 - val_loss: 1.2404 - val_accuracy: 0.5333
Epoch 28/30
62/62 [==============================] - 13s 216ms/step - loss: 0.5691 - accuracy: 0.7104 - val_loss: 0.6136 - val_accuracy: 0.6442
Epoch 29/30
62/62 [==============================] - 13s 215ms/step - loss: 0.5627 - accuracy: 0.7048 - val_loss: 0.6936 - val_accuracy: 0.6613
Epoch 30/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5714 - accuracy: 0.6941 - val_loss: 0.5872 - val_accuracy: 0.6825

Related

Fine tuning in CNN using Tensor Flow - 2.0

I am currently working defect classification problem in solar panel. It's a multi class classification problem. Currently its 3 class. I have done the coding part but my accuracy is very low. How to improve my accuracy?
Total training images - 900
Testing/validation - 300
Class - 3
My code is given below -
import tensorflow as tf
import keras_preprocessing
from keras_preprocessing import image
from keras_preprocessing.image import ImageDataGenerator
TRAINING_DIR = "/content/drive/My Drive/solar_images/solar_images/train/"
training_datagen = ImageDataGenerator(
rescale = 1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
VALIDATION_DIR = "/content/drive/My Drive/solar_images/solar_images/test/"
validation_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size=64
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical',
batch_size=64
)
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 150x150 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.5),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
model.summary()
model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
batch_size=64
history = model.fit(train_generator,
epochs=20,
steps_per_epoch=int(894/batch_size),
validation_data = validation_generator,
verbose = 1,
validation_steps=int(289/batch_size))
model.save("solar_images_weight.h5")
My accuracy is -
Epoch 1/20
13/13 [==============================] - 1107s 85s/step - loss: 1.2893 - accuracy: 0.3470 - val_loss: 1.0926 - val_accuracy: 0.3594
Epoch 2/20
13/13 [==============================] - 1239s 95s/step - loss: 1.1037 - accuracy: 0.3566 - val_loss: 1.0954 - val_accuracy: 0.3125
Epoch 3/20
13/13 [==============================] - 1203s 93s/step - loss: 1.0964 - accuracy: 0.3904 - val_loss: 1.0841 - val_accuracy: 0.5625
Epoch 4/20
13/13 [==============================] - 1182s 91s/step - loss: 1.0980 - accuracy: 0.3750 - val_loss: 1.0894 - val_accuracy: 0.3633
Epoch 5/20
13/13 [==============================] - 1218s 94s/step - loss: 1.1086 - accuracy: 0.3386 - val_loss: 1.0874 - val_accuracy: 0.3125
Epoch 6/20
13/13 [==============================] - 1214s 93s/step - loss: 1.0953 - accuracy: 0.3257 - val_loss: 1.0763 - val_accuracy: 0.6094
Epoch 7/20
13/13 [==============================] - 1136s 87s/step - loss: 1.0851 - accuracy: 0.3831 - val_loss: 1.0754 - val_accuracy: 0.3164
Epoch 8/20
13/13 [==============================] - 1170s 90s/step - loss: 1.1005 - accuracy: 0.3940 - val_loss: 1.0545 - val_accuracy: 0.5039
Epoch 9/20
13/13 [==============================] - 1138s 88s/step - loss: 1.1294 - accuracy: 0.4337 - val_loss: 1.0130 - val_accuracy: 0.5703
Epoch 10/20
13/13 [==============================] - 1131s 87s/step - loss: 1.0250 - accuracy: 0.4531 - val_loss: 0.8911 - val_accuracy: 0.6055
Epoch 11/20
13/13 [==============================] - 1162s 89s/step - loss: 1.0243 - accuracy: 0.4735 - val_loss: 0.9160 - val_accuracy: 0.4727
Epoch 12/20
13/13 [==============================] - 1153s 89s/step - loss: 0.9978 - accuracy: 0.4783 - val_loss: 0.7754 - val_accuracy: 0.6406
Epoch 13/20
13/13 [==============================] - 1187s 91s/step - loss: 1.0080 - accuracy: 0.4687 - val_loss: 0.7701 - val_accuracy: 0.6602
Epoch 14/20
13/13 [==============================] - 1204s 93s/step - loss: 0.9851 - accuracy: 0.5048 - val_loss: 0.7450 - val_accuracy: 0.6367
Epoch 15/20
13/13 [==============================] - 1181s 91s/step - loss: 0.9699 - accuracy: 0.4892 - val_loss: 0.7409 - val_accuracy: 0.6289
Epoch 16/20
13/13 [==============================] - 1187s 91s/step - loss: 0.8884 - accuracy: 0.5241 - val_loss: 0.7169 - val_accuracy: 0.6133
Epoch 17/20
13/13 [==============================] - 1197s 92s/step - loss: 0.9372 - accuracy: 0.5084 - val_loss: 0.7464 - val_accuracy: 0.5859
Epoch 18/20
13/13 [==============================] - 1224s 94s/step - loss: 0.9230 - accuracy: 0.5229 - val_loss: 0.9198 - val_accuracy: 0.5156
Epoch 19/20
13/13 [==============================] - 1270s 98s/step - loss: 0.9161 - accuracy: 0.5192 - val_loss: 0.6785 - val_accuracy: 0.6289
Epoch 20/20
13/13 [==============================] - 1173s 90s/step - loss: 0.8728 - accuracy: 0.5193 - val_loss: 0.6674 - val_accuracy: 0.5781
Training and validation accuracy plot is given below -
You could use transfer learning. Using a pre-trained model such as mobilenet or inception to train on your dataset. This would significantly improve your accuracy.

Validation Accuracy Does Not Improve in CNN

I have a CNN like AlexNet trying to predict class of the ornament. The train accuracy and loss monotonically increase and decrease respectively. But, the test accuracy fluctuates around 0.50.
I've tried to change various hyperparameters, changed batch size,used data augmentation, changed data to gray scale because its just stone pictures, added dropout, regularization, Gaussian noise, changed the unit count in dense layers but still the validation accuracy does not change.
I don't know what to do and how to improve my model. Please help me
from keras.preprocessing.image import ImageDataGenerator
train_datagen=ImageDataGenerator (rescale = 1/255,
featurewise_center =True,
shear_range= 0.2,
zoom_range=0.2,
rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1,
fill_mode = 'nearest',
vertical_flip = True,
horizontal_flip=True)
training_set=train_datagen.flow_from_directory('/content/drive/My Drive/DATASET1/train',
target_size= (224,224),
batch_size= 128,
color_mode='grayscale',
class_mode='categorical')
test_datagen=ImageDataGenerator ( rescale = 1/255,
featurewise_center =True,
#shear_range= 0.2,
#zoom_range=0.2,
#horizontal_flip=True
)
test_set=test_datagen.flow_from_directory('/content/drive/My Drive/DATASET1/val',
target_size= (224,224),
batch_size= 48,
color_mode='grayscale',
class_mode='categorical')
model = Sequential()
# 1st Convolutional Layer
model.add(Conv2D(filters=96, input_shape=(224,224,1), kernel_size=(11,11), strides=(4,4), padding="same", activation = "relu"))
# Max Pooling
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding="valid"))
# Batch Normalisation before passing it to the next layer
model.add(BatchNormalization())
# 2nd Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding="same", activation = "relu"))
# Max Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding="valid"))
# Batch Normalisation
model.add(BatchNormalization())
# 3rd Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding="same", activation = "relu"))
# Batch Normalisation
model.add(BatchNormalization())
# 4th Convolutional Layer
model.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding="same", activation = "relu"))
# Batch Normalisation
model.add(BatchNormalization())
# 5th Convolutional Layer
model.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding="same", activation = "relu"))
# Max Pooling
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding="valid"))
# Batch Normalisation
model.add(BatchNormalization())
# Passing it to a Fully Connected layer
model.add(Flatten())
# 1st Fully Connected Layer
regularizer =keras.regularizers.l2(l=0.0005)
model.add(GaussianNoise(0.1))
model.add(Dense(units = 4096, activation = "relu", kernel_regularizer = regularizer))
# Add Dropout to prevent overfitting
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None))
# 2nd Fully Connected Layer
regularizer =keras.regularizers.l2(l=0.0005)
model.add(GaussianNoise(0.1))
model.add(Dense(units = 2048, activation = "relu", kernel_regularizer = regularizer ))
# Add Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())
# 3rd Fully Connected Layer
regularizer =keras.regularizers.l2(l=0.0005)
model.add(GaussianNoise(0.1))
model.add(Dense(2048, activation = "relu", kernel_regularizer = regularizer))
# Add Dropout
model.add(Dropout(0.4))
# Batch Normalisation
model.add(BatchNormalization())
# Output Layer
model.add(Dense(2, activation = "softmax")) #As we have two classes
Epoch 1/20
/usr/local/lib/python3.6/dist-packages/keras_preprocessing/image/image_data_generator.py:716: UserWarning: This ImageDataGenerator specifies `featurewise_center`, but it hasn't been fit on any training data. Fit it first by calling `.fit(numpy_data)`.
warnings.warn('This ImageDataGenerator specifies ')
5/5 [==============================] - 9s 2s/step - loss: 6.2275 - accuracy: 0.5244 - val_loss: 5.9162 - val_accuracy: 0.4985
Epoch 00001: val_accuracy improved from -inf to 0.49853, saving model to alexnet_1.h5
Epoch 2/20
5/5 [==============================] - 7s 1s/step - loss: 6.1302 - accuracy: 0.6031 - val_loss: 5.9220 - val_accuracy: 0.5103
Epoch 00002: val_accuracy improved from 0.49853 to 0.51032, saving model to alexnet_1.h5
Epoch 3/20
5/5 [==============================] - 5s 1s/step - loss: 6.1390 - accuracy: 0.6250 - val_loss: 6.0433 - val_accuracy: 0.4932
Epoch 00003: val_accuracy did not improve from 0.51032
Epoch 4/20
5/5 [==============================] - 6s 1s/step - loss: 6.0528 - accuracy: 0.6429 - val_loss: 5.9255 - val_accuracy: 0.4985
Epoch 00004: val_accuracy did not improve from 0.51032
Epoch 5/20
5/5 [==============================] - 7s 1s/step - loss: 6.0935 - accuracy: 0.6094 - val_loss: 5.9714 - val_accuracy: 0.4926
Epoch 00005: val_accuracy did not improve from 0.51032
Epoch 6/20
5/5 [==============================] - 5s 1s/step - loss: 6.0139 - accuracy: 0.6447 - val_loss: 5.5711 - val_accuracy: 0.4932
Epoch 00006: val_accuracy did not improve from 0.51032
Epoch 7/20
5/5 [==============================] - 5s 1s/step - loss: 6.0250 - accuracy: 0.6353 - val_loss: 5.9171 - val_accuracy: 0.5133
Epoch 00007: val_accuracy improved from 0.51032 to 0.51327, saving model to alexnet_1.h5
Epoch 8/20
5/5 [==============================] - 7s 1s/step - loss: 6.0012 - accuracy: 0.6422 - val_loss: 6.0526 - val_accuracy: 0.4749
Epoch 00008: val_accuracy did not improve from 0.51327
Epoch 9/20
5/5 [==============================] - 6s 1s/step - loss: 5.9814 - accuracy: 0.6635 - val_loss: 5.4898 - val_accuracy: 0.4966
Epoch 00009: val_accuracy did not improve from 0.51327
Epoch 10/20
5/5 [==============================] - 5s 906ms/step - loss: 5.9613 - accuracy: 0.6769 - val_loss: 6.1255 - val_accuracy: 0.4956
Epoch 00010: val_accuracy did not improve from 0.51327
Epoch 11/20
5/5 [==============================] - 6s 1s/step - loss: 5.9888 - accuracy: 0.6484 - val_loss: 6.2377 - val_accuracy: 0.4956
Epoch 00011: val_accuracy did not improve from 0.51327
Epoch 12/20
5/5 [==============================] - 5s 1s/step - loss: 6.0045 - accuracy: 0.6767 - val_loss: 5.4328 - val_accuracy: 0.4932
Epoch 00012: val_accuracy did not improve from 0.51327
Epoch 13/20
5/5 [==============================] - 5s 1s/step - loss: 5.9569 - accuracy: 0.6654 - val_loss: 5.9874 - val_accuracy: 0.4985
Epoch 00013: val_accuracy did not improve from 0.51327
Epoch 14/20
5/5 [==============================] - 7s 1s/step - loss: 5.8978 - accuracy: 0.6859 - val_loss: 6.2074 - val_accuracy: 0.4897
Epoch 00014: val_accuracy did not improve from 0.51327
Epoch 15/20
5/5 [==============================] - 5s 1s/step - loss: 6.0063 - accuracy: 0.6792 - val_loss: 5.3235 - val_accuracy: 0.4966
Epoch 00015: val_accuracy did not improve from 0.51327
Epoch 16/20
5/5 [==============================] - 6s 1s/step - loss: 5.8966 - accuracy: 0.7068 - val_loss: 6.1324 - val_accuracy: 0.5015
Epoch 00016: val_accuracy did not improve from 0.51327
Epoch 17/20
5/5 [==============================] - 7s 1s/step - loss: 5.9352 - accuracy: 0.6562 - val_loss: 6.2356 - val_accuracy: 0.4867
Epoch 00017: val_accuracy did not improve from 0.51327
Epoch 18/20
5/5 [==============================] - 6s 1s/step - loss: 5.9475 - accuracy: 0.6391 - val_loss: 7.9573 - val_accuracy: 0.4966
Epoch 00018: val_accuracy did not improve from 0.51327
Epoch 19/20
5/5 [==============================] - 5s 1s/step - loss: 5.9627 - accuracy: 0.6898 - val_loss: 6.0916 - val_accuracy: 0.4985
Epoch 00019: val_accuracy did not improve from 0.51327
Epoch 20/20
5/5 [==============================] - 6s 1s/step - loss: 5.8621 - accuracy: 0.6974 - val_loss: 6.3277 - val_accuracy: 0.4926
Epoch 00020: val_accuracy did not improve from 0.51327
As you said in Output layer, you have 2 classes, which indicates this model is for binary classification-(0,1). For this classification, you need to define output layer as below:
model.add(Dense(1, activation = "sigmoid"))
Along with class_mode='binary' and binary_crossentropy loss function in your model.
I have removed BatchNormalization() and Dropout() to test because we are already using data augmentation with large dataset.
model = Sequential()
model.add(Conv2D(16, input_shape=(224,224,1), kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(3,3), ))
model.add(Conv2D(32, kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2,2), ))
model.add(Conv2D(64, kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(128, kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation = "relu"))
model.add(Dense(64, activation = "relu" ))
model.add(Dense(32, activation = "relu"))
model.add(Dense(1, activation = "sigmoid")) #As we have two classes
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(training_set, epochs =50, validation_data=test_set)
Output:
Epoch 45/50
16/16 [==============================] - 23s 1s/step - loss: 0.5665 - accuracy: 0.7180 - val_loss: 0.5676 - val_accuracy: 0.6960
Epoch 46/50
16/16 [==============================] - 22s 1s/step - loss: 0.5678 - accuracy: 0.7120 - val_loss: 0.5528 - val_accuracy: 0.7160
Epoch 47/50
16/16 [==============================] - 22s 1s/step - loss: 0.5524 - accuracy: 0.7305 - val_loss: 0.5584 - val_accuracy: 0.7060
Epoch 48/50
16/16 [==============================] - 23s 1s/step - loss: 0.5651 - accuracy: 0.7100 - val_loss: 0.5554 - val_accuracy: 0.7120
Epoch 49/50
16/16 [==============================] - 22s 1s/step - loss: 0.5587 - accuracy: 0.7145 - val_loss: 0.5604 - val_accuracy: 0.7120
Epoch 50/50
16/16 [==============================] - 22s 1s/step - loss: 0.5522 - accuracy: 0.7265 - val_loss: 0.5281 - val_accuracy: 0.7260
You can find the details to overcome from overfitting problem and optimize the model better by following this reference.

Convolutional Neural Network seems to be randomly guessing

So I am currently trying to build a race recognition program using a convolution neural network. I'm inputting 200px by 200px versions of the UTKFaceRegonition dataset (put my dataset on a google drive if you want to take a look). Im using 8 different classes (4 races * 2 genders) using keras and tensorflow, each having about 700 images but I have done it with 1000. The problem is when I run the network it gets at best 13.5% accuracy and about 11-12.5% validation accuracy, with a loss around 2.079-2.081, even after 50 epochs or so it won't improve at all. My current hypothesis is that it is randomly guessing stuff/not learning because 8/100=12.5%, which is about what it is getting and on other models I have made with 3 classes it was getting about 33%
I noticed the validation accuracy is different on the first and sometimes second epoch, but after that it ends up staying constant. I've increased the pixel resolution, changed amount of layers, types of layer and neurons per layer, I've tried optimizers (sgd at the normal lr and at very large and small (.1 and 10^-6) and I've tried different loss functions like KLDivergence but nothing seems to have any effect on it except KLDivergence which on one run did pretty well (about 16%) but then it flopped again. Some ideas I had are maybe theres too much noise in the dataset or maybe it has to do with the amount of dense layers, but honestly I dont know why it is not learning.
Heres the code to make the tensors
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
import cv2
import random
import pickle
WIDTH_SIZE = 200
HEIGHT_SIZE = 200
CATEGORIES = []
for CATEGORY in os.listdir('./TRAINING'):
CATEGORIES.append(CATEGORY)
DATADIR = "./TRAINING"
training_data = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path)[:700]:
try:
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_COLOR)
new_array = cv2.resize(img_array,(WIDTH_SIZE,HEIGHT_SIZE))
training_data.append([new_array,class_num])
except Exception as error:
print(error)
create_training_data()
random.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, WIDTH_SIZE, HEIGHT_SIZE, 3)
y = np.array(y)
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)
Heres my built model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0
model = Sequential()
model.add(Conv2D(256, (2,2), activation = 'relu', input_shape = X.shape[1:]))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(8, activation="softmax"))
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
model.fit(X, y, batch_size=16,epochs=100,validation_split=.1)
Heres a log of 10 epochs I ran.
5040/5040 [==============================] - 55s 11ms/sample - loss: 2.0803 - accuracy: 0.1226 - val_loss: 2.0796 - val_accuracy: 0.1250
Epoch 2/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1147 - val_loss: 2.0798 - val_accuracy: 0.1161
Epoch 3/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1190 - val_loss: 2.0800 - val_accuracy: 0.1161
Epoch 4/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1173 - val_loss: 2.0799 - val_accuracy: 0.1107
Epoch 5/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1183 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 6/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1226 - val_loss: 2.0801 - val_accuracy: 0.1107
Epoch 7/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1238 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 8/100
5040/5040 [==============================] - 54s 11ms/sample - loss: 2.0797 - accuracy: 0.1169 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 9/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1212 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 10/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1177 - val_loss: 2.0802 - val_accuracy: 0.1107
So yeah, any help on why my network seems to be just guessing? Thank you!
The problem lies in the design of you network.
Typically you'd want in the first layers to learn high-level features and use larger kernel with odd size. Currently you're essentially interpolating neighbouring pixels. Why odd size? Read e.g. here.
Number of filters typically increases from small (e.g. 16, 32) number to larger values when going deeper into the network. In your network all layers learn the same number of filters. The reasoning is that the deeper you go, the more fine-grained features you'd like to learn - hence increase in number of filters.
In your ANN each layer also cuts out valuable information from the image (by default you are using valid padding).
Here's a very basic network that gets me after 40 seconds and 10 epochs over 95% training accuracy:
import pickle
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0
model = Sequential()
model.add(Conv2D(16, (5,5), activation = 'relu', input_shape = X.shape[1:], padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(32, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512))
model.add(Dense(8, activation='softmax'))
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
Architecture:
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_19 (Conv2D) (None, 200, 200, 16) 1216
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 100, 100, 16) 0
_________________________________________________________________
conv2d_20 (Conv2D) (None, 100, 100, 32) 4640
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 50, 50, 32) 0
_________________________________________________________________
conv2d_21 (Conv2D) (None, 50, 50, 64) 18496
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 25, 25, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 40000) 0
_________________________________________________________________
dense_7 (Dense) (None, 512) 20480512
_________________________________________________________________
dense_8 (Dense) (None, 8) 4104
=================================================================
Total params: 20,508,968
Trainable params: 20,508,968
Non-trainable params: 0
Training:
Train on 5040 samples, validate on 560 samples
Epoch 1/10
5040/5040 [==============================] - 7s 1ms/sample - loss: 2.2725 - accuracy: 0.1897 - val_loss: 1.8939 - val_accuracy: 0.2946
Epoch 2/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.7831 - accuracy: 0.3375 - val_loss: 1.8658 - val_accuracy: 0.3179
Epoch 3/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.4857 - accuracy: 0.4623 - val_loss: 1.9507 - val_accuracy: 0.3357
Epoch 4/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.1294 - accuracy: 0.6028 - val_loss: 2.1745 - val_accuracy: 0.3250
Epoch 5/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.8060 - accuracy: 0.7179 - val_loss: 3.1622 - val_accuracy: 0.3000
Epoch 6/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.5574 - accuracy: 0.8169 - val_loss: 3.7494 - val_accuracy: 0.2839
Epoch 7/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3756 - accuracy: 0.8813 - val_loss: 4.9125 - val_accuracy: 0.2643
Epoch 8/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3001 - accuracy: 0.9036 - val_loss: 5.6300 - val_accuracy: 0.2821
Epoch 9/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.2345 - accuracy: 0.9337 - val_loss: 5.7263 - val_accuracy: 0.2679
Epoch 10/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.1549 - accuracy: 0.9581 - val_loss: 7.3682 - val_accuracy: 0.2732
As you can see, validation score is terrible, but the point was to demonstrate that poor architecture can prevent training altogether.

Keras NLP validation loss increases while training accuracy increases

I have looked at other posts with similar problems and it seems that my model is overfitting. However, I've tried regularization, dropout, reducing parameters, decreasing the learning rate and changing the loss function, but nothing seems to help.
Here is my model:
model = Sequential([
Embedding(max_words, 64),
Dropout(.5),
Bidirectional(GRU(64, return_sequences = True), merge_mode='concat'),
GlobalMaxPooling1D(),
Dense(64),
Dropout(.5),
Dense(1, activation='sigmoid')
])
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train, batch_size=32, epochs=25, verbose=1, validation_data=(x_test, y_test),shuffle=True)
And my training output:
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) (None, None, 64) 320000
_________________________________________________________________
dropout_6 (Dropout) (None, None, 64) 0
_________________________________________________________________
bidirectional_3 (Bidirection (None, None, 128) 49920
_________________________________________________________________
global_max_pooling1d_3 (Glob (None, 128) 0
_________________________________________________________________
dense_3 (Dense) (None, 64) 8256
_________________________________________________________________
dropout_7 (Dropout) (None, 64) 0
_________________________________________________________________
dense_4 (Dense) (None, 1) 65
=================================================================
Total params: 378,241
Trainable params: 378,241
Non-trainable params: 0
_________________________________________________________________
Epoch 1/25
229/229 [==============================] - 7s 32ms/step - loss: 0.6952 - accuracy: 0.4939 - val_loss: 0.6923 - val_accuracy: 0.5240
Epoch 2/25
229/229 [==============================] - 7s 30ms/step - loss: 0.6917 - accuracy: 0.5144 - val_loss: 0.6973 - val_accuracy: 0.4815
Epoch 3/25
229/229 [==============================] - 7s 30ms/step - loss: 0.6709 - accuracy: 0.5881 - val_loss: 0.7164 - val_accuracy: 0.4784
Epoch 4/25
229/229 [==============================] - 7s 30ms/step - loss: 0.6070 - accuracy: 0.6711 - val_loss: 0.7704 - val_accuracy: 0.4977
Epoch 5/25
229/229 [==============================] - 7s 30ms/step - loss: 0.5370 - accuracy: 0.7325 - val_loss: 0.8411 - val_accuracy: 0.4876
Epoch 6/25
229/229 [==============================] - 7s 30ms/step - loss: 0.4770 - accuracy: 0.7714 - val_loss: 0.9479 - val_accuracy: 0.4784
Epoch 7/25
229/229 [==============================] - 7s 30ms/step - loss: 0.4228 - accuracy: 0.8016 - val_loss: 1.0987 - val_accuracy: 0.4884
Epoch 8/25
229/229 [==============================] - 7s 30ms/step - loss: 0.3697 - accuracy: 0.8344 - val_loss: 1.2714 - val_accuracy: 0.4760
Epoch 9/25
229/229 [==============================] - 7s 30ms/step - loss: 0.3150 - accuracy: 0.8582 - val_loss: 1.4184 - val_accuracy: 0.4822
Epoch 10/25
229/229 [==============================] - 7s 31ms/step - loss: 0.2725 - accuracy: 0.8829 - val_loss: 1.6053 - val_accuracy: 0.4946
Epoch 11/25
229/229 [==============================] - 7s 31ms/step - loss: 0.2277 - accuracy: 0.9056 - val_loss: 1.8131 - val_accuracy: 0.4884
Epoch 12/25
229/229 [==============================] - 7s 31ms/step - loss: 0.1929 - accuracy: 0.9253 - val_loss: 1.9327 - val_accuracy: 0.4977
Epoch 13/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1717 - accuracy: 0.9318 - val_loss: 2.2280 - val_accuracy: 0.4900
Epoch 14/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1643 - accuracy: 0.9324 - val_loss: 2.2811 - val_accuracy: 0.4915
Epoch 15/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1419 - accuracy: 0.9439 - val_loss: 2.4530 - val_accuracy: 0.4830
Epoch 16/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1255 - accuracy: 0.9521 - val_loss: 2.6692 - val_accuracy: 0.4992
Epoch 17/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1124 - accuracy: 0.9558 - val_loss: 2.8106 - val_accuracy: 0.4892
Epoch 18/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1130 - accuracy: 0.9556 - val_loss: 2.6792 - val_accuracy: 0.4907
Epoch 19/25
229/229 [==============================] - 7s 30ms/step - loss: 0.1085 - accuracy: 0.9610 - val_loss: 2.8966 - val_accuracy: 0.5093
Epoch 20/25
229/229 [==============================] - 7s 30ms/step - loss: 0.0974 - accuracy: 0.9656 - val_loss: 2.8636 - val_accuracy: 0.5147
Epoch 21/25
229/229 [==============================] - 7s 30ms/step - loss: 0.0921 - accuracy: 0.9663 - val_loss: 2.9874 - val_accuracy: 0.4977
Epoch 22/25
229/229 [==============================] - 7s 30ms/step - loss: 0.0888 - accuracy: 0.9685 - val_loss: 3.0295 - val_accuracy: 0.4969
Epoch 23/25
229/229 [==============================] - 7s 30ms/step - loss: 0.0762 - accuracy: 0.9731 - val_loss: 3.0607 - val_accuracy: 0.4884
Epoch 24/25
229/229 [==============================] - 7s 30ms/step - loss: 0.0842 - accuracy: 0.9692 - val_loss: 3.0552 - val_accuracy: 0.4900
Epoch 25/25
229/229 [==============================] - 7s 30ms/step - loss: 0.0816 - accuracy: 0.9693 - val_loss: 2.9571 - val_accuracy: 0.5015
My validation loss seems to always increases no matter what. I am trying to predict political affiliation from tweets. The dataset I am using has worked well on other models, so perhaps there is something wrong with my data preprocessing instead?
import pandas as pd
dataset = pd.read_csv('political_tweets.csv')
dataset.head()
dataset = pd.read_csv('political_tweets.csv')["tweet"].values
y_train = pd.read_csv('political_tweets.csv')["dem_or_rep"].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(dataset, y_train, test_size=0.15, shuffle=True)
print(x_train[0])
print(x_test[0])
max_words = 10000
max_len = 25
tokenizer = Tokenizer(num_words = max_words, filters='!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n1234567890', lower=False,oov_token="<OOV>")
tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = pad_sequences(x_train, max_len, padding='post', truncating='post')
tokenizer.fit_on_texts(x_test)
x_test = tokenizer.texts_to_sequences(x_test)
x_test = pad_sequences(x_test, max_len, padding='post', truncating='post')
I am really stumped. Any help is appreciated.
You're doing a binary classification and your validation accuracy is near 50%. It just means your model learnt nothing useful, it's equivalent to random prediction.
Your training accuracy is really high, which suggests your model is badly overfitted.
Don't apply dropout after embedding layer, it can mess everything up.
Remove this Dense(64), after GlobalPooling.
Use recurrent_dropout in GRU.
Train for fewer epochs.
Reduce vocabulary, remove stop words. Maybe there is too noise, as your sequence length is only 25, noisy stop words can fool the model.
import nltk
from nltk.corpus import stopwords
set(stopwords.words('english'))
Your model is still overfitting. Try reducing embedding output_dim and GRU units both with many combinations.

Why does a cnn with keras not learn?

I am kind of new to deep learning and especially keras, and I have an assignment from university to train a CNN and learn about it, using keras. I am using the MURA dataset (skeletonal radiography).
What I have done until now is to go over all images from the dataset and split the training set into train and validation (90/10).
I am using a CNN that has been given in the paper and I am not allowed to modify it until the second task. The first task is to observe and understand the CNN.
def run():
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory('train_data',
target_size=(227,227),
batch_size=BATCH_SIZE,
class_mode='binary',
color_mode='grayscale'
)
val_generator = val_datagen.flow_from_directory('test_data',
target_size=(227,227),
batch_size=BATCH_SIZE,
class_mode='binary',
color_mode='grayscale'
)
classifier = Sequential()
classifier.add(Conv2D(64,(7,7),strides=2, input_shape=(227,227,1)))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), strides=2))
classifier.add(Conv2D(128, (5,5), strides=2 ))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2),strides=2))
classifier.add(Conv2D(256, (3,3), strides=1))
classifier.add(Activation('relu'))
classifier.add(Conv2D(384, (3,3), strides=1))
classifier.add(Activation('relu'))
classifier.add(Conv2D(256, (3,3), strides=1))
classifier.add(Activation('relu'))
classifier.add(Conv2D(256, (3,3), strides=1))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2),strides=2))
classifier.add(Flatten())
classifier.add(Dropout(0.5))
classifier.add(Dense(units=2048))
classifier.add(Activation('relu'))
classifier.add(Dropout(0.5))
classifier.add(Dense(units=1))
classifier.add(Activation('sigmoid'))
classifier.summary()
# from keras.optimizers import SGD
# sg = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
classifier.compile(optimizer=keras.optimizers.SGD(),loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit_generator(train_generator,
steps_per_epoch= training_len//BATCH_SIZE,
epochs=10,
validation_data=val_generator,
validation_steps= valid_len//BATCH_SIZE,
shuffle=True,
verbose=1)
classifier.save_weights('first_model_weights.h5')
classifier.save('first_model.h5')
The problem I am having is that if I run this, it just does not learn. Or at least I think it doesn't.
The output looks like this:
Epoch 1/10
575/575 [==============================] - 693s 1s/step - loss: 0.6767 - acc: 0.5958 - val_loss: 0.6751 - val_acc: 0.5966
Epoch 2/10
575/575 [==============================] - 207s 359ms/step - loss: 0.6760 - acc: 0.5948 - val_loss: 0.6752 - val_acc: 0.5958
Epoch 3/10
575/575 [==============================] - 258s 448ms/step - loss: 0.6745 - acc: 0.5983 - val_loss: 0.6748 - val_acc: 0.5958
Epoch 4/10
575/575 [==============================] - 165s 287ms/step - loss: 0.6760 - acc: 0.5950 - val_loss: 0.6757 - val_acc: 0.5947
Epoch 5/10
575/575 [==============================] - 166s 288ms/step - loss: 0.6761 - acc: 0.5948 - val_loss: 0.6731 - val_acc: 0.6016
Epoch 6/10
575/575 [==============================] - 167s 290ms/step - loss: 0.6742 - acc: 0.5990 - val_loss: 0.6778 - val_acc: 0.5875
Epoch 7/10
575/575 [==============================] - 206s 359ms/step - loss: 0.6762 - acc: 0.5938 - val_loss: 0.6721 - val_acc: 0.6038
Epoch 8/10
575/575 [==============================] - 165s 286ms/step - loss: 0.6762 - acc: 0.5938 - val_loss: 0.6763 - val_acc: 0.5947
Epoch 9/10
575/575 [==============================] - 164s 286ms/step - loss: 0.6751 - acc: 0.5972 - val_loss: 0.6787 - val_acc: 0.5897
Epoch 10/10
575/575 [==============================] - 168s 292ms/step - loss: 0.6750 - acc: 0.5971 - val_loss: 0.6722 - val_acc: 0.6022
Am I doing something wrong in the code? Is it the dataset splitting? I am currently in a dark spot and I can't seem to figure it out.