Training CNN with Keras was fast, now it is many times slower - tensorflow

I was training a CNN with 120 thousand pictures, and it was ok. About 320 seconds per epoch.
3073/3073 [==============================] - 340s 110ms/step - loss: 0.4146 - accuracy: 0.8319 - val_loss: 0.3776 - val_accuracy: 0.8489
Epoch 2/20
3073/3073 [==============================] - 324s 105ms/step - loss: 0.3462 - accuracy: 0.8683 - val_loss: 0.3241 - val_accuracy: 0.8770
Epoch 3/20
3073/3073 [==============================] - 314s 102ms/step - loss: 0.3061 - accuracy: 0.8878 - val_loss: 0.2430 - val_accuracy: 0.9052
Epoch 4/20
3073/3073 [==============================] - 327s 107ms/step - loss: 0.2851 - accuracy: 0.8977 - val_loss: 0.2236 - val_accuracy: 0.9149
Epoch 5/20
3073/3073 [==============================] - 318s 104ms/step - loss: 0.2725 - accuracy: 0.9033 - val_loss: 0.2450 - val_accuracy: 0.9119
Epoch 6/20
3073/3073 [==============================] - 309s 101ms/step - loss: 0.2642 - accuracy: 0.9065 - val_loss: 0.2168 - val_accuracy: 0.9218
Epoch 7/20
3073/3073 [==============================] - 311s 101ms/step - loss: 0.2589 - accuracy: 0.9083 - val_loss: 0.1996 - val_accuracy: 0.9286
Epoch 8/20
3073/3073 [==============================] - 317s 103ms/step - loss: 0.2538 - accuracy: 0.9110 - val_loss: 0.2653 - val_accuracy: 0.9045
Epoch 9/20
3073/3073 [==============================] - 1346s 438ms/step - loss: 0.2497 - accuracy: 0.9116 - val_loss: 0.2353 - val_accuracy: 0.9219
Epoch 10/20
3073/3073 [==============================] - 1434s 467ms/step - loss: 0.2457 - accuracy: 0.9141 - val_loss: 0.1943 - val_accuracy: 0.9326`
Then, after a few tests, it became 12x (times) slower. With the same parameters. I thought it was because the CPU got too hot, but it never worked like before again. I tried reseting my keras session, tried reinstalling Ubuntu 18.04, tried setting up my GPU, tried installing on conda, but none of these worked. Also tried other codes, with different datasets, which are also slower. Quite frustrating.
No other symptom of a burnt CPU problem. Everything runs as always did.
Tensorflow 2.2.0
I would appreciate some help.
Edit: actually, if I wait one hour, it gets back to normal speed! Now I think it could be some problem with the first import from the disk into the memory.
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator
from keras.models import model_from_json
from keras.preprocessing import image
import os
#======= GPU
#os.environ["CUDA_VISIBLE_DEVICES"] = "-1" #disables GPU
#os.environ['TF_CPP_MIN_LOG_LEVEL'] = '4' #verbose
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
from tensorflow.compat.v1.keras import backend as K
K.clear_session()
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
K.set_session(session)
#======= GPU
print(tf.__version__)
callbacks = [
tf.keras.callbacks.EarlyStopping(
# Stop training when `val_loss` is no longer improving
monitor='val_accuracy',
# "no longer improving" being defined as "no better than 1e-2 less"
min_delta=1e-3,
# "no longer improving" being further defined as "for at least 2 epochs"
patience=2,
verbose=1)
]
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
#flow_from_directory knows categories are divided by folders
training_set = train_datagen.flow_from_directory('dataset/training_set', #applies a method to the object
target_size = (64, 64),
batch_size = 32, #minibatch
class_mode = 'categorical')
print(training_set.class_indices)
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/test_set',
target_size = (64, 64), #has to be the same as that of the training set
batch_size = 32,
class_mode = 'categorical')
cnn = tf.keras.models.Sequential() #an ANN base
#convolution
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3])) #convolutional layer
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2)) #maxPooling
#new convolution
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu')) #remove input_shape
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
cnn.add(tf.keras.layers.Flatten())
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
cnn.add(tf.keras.layers.Dense(units=2, activation='softmax'))
cnn.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
print("Got here.")
cnn.fit(x = training_set, validation_data = test_set, epochs = 20)
# serialize model to JSON
model_json = cnn.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
cnn.save_weights("model.h5")
print("Saved model to disk.")

Related

Transfer learning model not learning much, validation plateau at 45% while train go up to 90%

So it's been days i've been working on this model on image classification. I have 70000 images and 375 classes. I've tried training it with Vgg16, Xception, Resnet & Mobilenet ... and I always get the same limit of 45% on the validation.
As you can see here
I've tried adding dropout layers and regularization and it gets the same result for validation.
Data augmentation didn't do much to help either
Any ideas why this isn't working ?
Here's a snipped of the code of the last model I used:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras import regularizers
from PIL import Image
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
validation_datagen = ImageDataGenerator(rescale=1./255)
target_size = (height, width)
datagen = ImageDataGenerator(rescale=1./255,
validation_split=0.2)
train_generator = datagen.flow_from_directory(
path,
target_size=(height, width),
batch_size=batchSize,
shuffle=True,
class_mode='categorical',
subset='training')
validation_generator = datagen.flow_from_directory(
path,
target_size=(height, width),
batch_size=batchSize,
class_mode='categorical',
subset='validation')
num_classes = len(train_generator.class_indices)
xception_model = Xception(weights='imagenet',input_shape=(width, height, 3), include_top=False,classes=num_classes)
x = xception_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
out = Dense(num_classes, activation='softmax')(x)
opt = Adam()
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
n_epochs = 15
history = model.fit(
train_generator,
steps_per_epoch = train_generator.samples // batchSize,
validation_data = validation_generator,
validation_steps = validation_generator.samples // batchSize,
verbose=1,
epochs = n_epochs)
Yes, you may need a balanced dataset among each category in your dataset for better model training performance. Please try again by changing class_mode='sparse' and loss='sparse_categorical_crossentropy' because you are using the image dataset. Also freeze the pretrained model layers 'xception_model.trainable = False'.
Check the below code: (I have used a flower dataset of 5 classes)
xception_model = tf.keras.applications.Xception(weights='imagenet',input_shape=(width, height, 3), include_top=False,classes=num_classes)
xception_model.trainable = False
x = xception_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(32, activation='relu')(x)
out = Dense(num_classes, activation='softmax')(x)
opt = tf.keras.optimizers.Adam()
model = keras.Model(inputs=xception_model.input, outputs=out)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_generator, epochs=10, validation_data=validation_generator)
Output:
Epoch 1/10
217/217 [==============================] - 23s 95ms/step - loss: 0.5945 - accuracy: 0.7793 - val_loss: 0.4610 - val_accuracy: 0.8337
Epoch 2/10
217/217 [==============================] - 20s 91ms/step - loss: 0.3439 - accuracy: 0.8797 - val_loss: 0.4550 - val_accuracy: 0.8419
Epoch 3/10
217/217 [==============================] - 20s 93ms/step - loss: 0.2570 - accuracy: 0.9150 - val_loss: 0.4437 - val_accuracy: 0.8384
Epoch 4/10
217/217 [==============================] - 20s 91ms/step - loss: 0.2040 - accuracy: 0.9340 - val_loss: 0.4592 - val_accuracy: 0.8477
Epoch 5/10
217/217 [==============================] - 20s 91ms/step - loss: 0.1649 - accuracy: 0.9494 - val_loss: 0.4686 - val_accuracy: 0.8512
Epoch 6/10
217/217 [==============================] - 20s 92ms/step - loss: 0.1301 - accuracy: 0.9589 - val_loss: 0.4805 - val_accuracy: 0.8488
Epoch 7/10
217/217 [==============================] - 20s 93ms/step - loss: 0.0966 - accuracy: 0.9754 - val_loss: 0.4993 - val_accuracy: 0.8442
Epoch 8/10
217/217 [==============================] - 20s 91ms/step - loss: 0.0806 - accuracy: 0.9806 - val_loss: 0.5488 - val_accuracy: 0.8372
Epoch 9/10
217/217 [==============================] - 20s 91ms/step - loss: 0.0623 - accuracy: 0.9864 - val_loss: 0.5802 - val_accuracy: 0.8360
Epoch 10/10
217/217 [==============================] - 22s 100ms/step - loss: 0.0456 - accuracy: 0.9896 - val_loss: 0.6005 - val_accuracy: 0.8360

Loss when starting fine tuning is higher than loss from transfer learning

Since I start fine tuning with the weights learned by transfer learning, I would expect the loss to be the same or less. However it looks like it starts fine tuning using a different set of starting weights.
Code to start transfer learning:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(units=3, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[callback],)
Output from last epoch:
Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937
Code to start fine tuning:
model.trainable = True
# Fine-tune from this layer onwards
fine_tune_at = -20
# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
loss='binary_crossentropy',
metrics=['accuracy'])
history_fine = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
validation_data=val_generator,
validation_steps=len(val_generator),
callbacks=[callback],)
But this is what I see for the first few epochs:
Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
"Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881
Eventually the loss drops and passes the transfer learning loss:
Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563
Why was the loss in the first epoch of fine tuning higher than the last loss from transfer learning?

Significantly higher testing accuracy on mnist with keras than tensorflow.keras

I was verifying with a basic example my TensorFlow (v2.2.0), Cuda (10.1), and cudnn (libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb) and I'm getting weird results...
When running the following example in Keras as shown in https://keras.io/examples/mnist_cnn/ I get the ~99% acc #validation. When I adapt the imports run via the TensorFlow I get only 86%.
I might be forgetting something.
To run using tensorflow:
from __future__ import print_function
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Sadly, I get the following output:
Epoch 2/12
469/469 [==============================] - 3s 6ms/step - loss: 2.2245 - accuracy: 0.2633 - val_loss: 2.1755 - val_accuracy: 0.4447
Epoch 3/12
469/469 [==============================] - 3s 7ms/step - loss: 2.1485 - accuracy: 0.3533 - val_loss: 2.0787 - val_accuracy: 0.5147
Epoch 4/12
469/469 [==============================] - 3s 6ms/step - loss: 2.0489 - accuracy: 0.4214 - val_loss: 1.9538 - val_accuracy: 0.6021
Epoch 5/12
469/469 [==============================] - 3s 6ms/step - loss: 1.9224 - accuracy: 0.4845 - val_loss: 1.7981 - val_accuracy: 0.6611
Epoch 6/12
469/469 [==============================] - 3s 6ms/step - loss: 1.7748 - accuracy: 0.5376 - val_loss: 1.6182 - val_accuracy: 0.7039
Epoch 7/12
469/469 [==============================] - 3s 6ms/step - loss: 1.6184 - accuracy: 0.5750 - val_loss: 1.4296 - val_accuracy: 0.7475
Epoch 8/12
469/469 [==============================] - 3s 7ms/step - loss: 1.4612 - accuracy: 0.6107 - val_loss: 1.2484 - val_accuracy: 0.7719
Epoch 9/12
469/469 [==============================] - 3s 6ms/step - loss: 1.3204 - accuracy: 0.6402 - val_loss: 1.0895 - val_accuracy: 0.7945
Epoch 10/12
469/469 [==============================] - 3s 6ms/step - loss: 1.2019 - accuracy: 0.6650 - val_loss: 0.9586 - val_accuracy: 0.8097
Epoch 11/12
469/469 [==============================] - 3s 7ms/step - loss: 1.1050 - accuracy: 0.6840 - val_loss: 0.8552 - val_accuracy: 0.8216
Epoch 12/12
469/469 [==============================] - 3s 7ms/step - loss: 1.0253 - accuracy: 0.7013 - val_loss: 0.7734 - val_accuracy: 0.8337
Test loss: 0.7734305262565613
Test accuracy: 0.8337000012397766
Nowhere near 99.25% as when I import Keras.
What am I missing?
Discrepancy in optimiser parameters between keras and tensorflow.keras
So the crux of the issue lies in the different default parameters for the Adadelta optimisers in Keras and Tensorflow. Specifically, the different learning rates. We can see this with a simple check. Using the Keras version of the code, print(keras.optimizers.Adadelta().get_config()) outpus
{'learning_rate': 1.0, 'rho': 0.95, 'decay': 0.0, 'epsilon': 1e-07}
And in the Tensorflow version, print(tf.optimizers.Adadelta().get_config() gives us
{'name': 'Adadelta', 'learning_rate': 0.001, 'decay': 0.0, 'rho': 0.95, 'epsilon': 1e-07}
As we can see, there is a discrepancy between the learning rates for the Adadelta optimisers. Keras has a default learning rate of 1.0 while Tensorflow has a default learning rate of 0.001 (consistent with their other optimisers).
Effects of a higher learning rate
Since the Keras version of the Adadelta optimiser has a larger learning rate, it converges much faster and achieves a high accuracy within 12 epochs, while the Tensorflow Adadelta optimiser requires a longer training time. If you increased the number of training epochs, the Tensorflow model could potentially achieve a 99% accuracy as well.
The fix
But instead of increasing the training time, we can simply initialise the Tensorflow model to behave in a similar way to the Keras model by changing the learning rate of Adadelta to 1.0. i.e.
model.compile(
loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.optimizers.Adadelta(learning_rate=1.0), # Note the new learning rate
metrics=['accuracy'])
Making this change, we get the following performance running on Tensorflow:
Epoch 12/12
60000/60000 [==============================] - 102s 2ms/sample - loss: 0.0287 - accuracy: 0.9911 - val_loss: 0.0291 - val_accuracy: 0.9907
Test loss: 0.029134796149221757
Test accuracy: 0.9907
which is close to the desired 99.25% accuracy.
p.s. Incidentally, it seems that the different default parameters between Keras and Tensorflow is a known issue that was fixed but then reverted:
https://github.com/keras-team/keras/pull/12841 software development is hard.

Add learning rate to history object of fit_generator with Tensorflow

I want to check how my optimizer is changing my learning rate. I am using tensorflow 1.15.
I run my model with fit_generator:
hist = model.fit_generator(dat, args.onthefly[0]//args.batch, args.epochs,
validation_data=val, validation_steps=args.onthefly[1]//args.batch,verbose=2,
use_multiprocessing=True, workers=56)
I choose the optimizer using the compile function:
model.compile(loss=loss,
optimizer=Nadam(lr=learning_rate),
metrics=['binary_accuracy']
)
How can I get the value of the learning rate at the end of each epoch?
You can do that using callbacks argument of model.fit_generator. Below is the code on how to implement it. Here I am incrementing learning rate by 0.01 for every epoch using tf.keras.callbacks.LearningRateScheduler and also displaying it at end of every epoch using tf.keras.callbacks.Callback.
Full Code -
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
import os
import numpy as np
import matplotlib.pyplot as plt
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')
train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')
train_cats_dir = os.path.join(train_dir, 'cats') # directory with our training cat pictures
train_dogs_dir = os.path.join(train_dir, 'dogs') # directory with our training dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cats') # directory with our validation cat pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs') # directory with our validation dog pictures
num_cats_tr = len(os.listdir(train_cats_dir))
num_dogs_tr = len(os.listdir(train_dogs_dir))
num_cats_val = len(os.listdir(validation_cats_dir))
num_dogs_val = len(os.listdir(validation_dogs_dir))
total_train = num_cats_tr + num_dogs_tr
total_val = num_cats_val + num_dogs_val
batch_size = 128
epochs = 5
IMG_HEIGHT = 150
IMG_WIDTH = 150
train_image_generator = ImageDataGenerator(rescale=1./255,brightness_range=[0.5,1.5]) # Generator for our training data
validation_image_generator = ImageDataGenerator(rescale=1./255,brightness_range=[0.5,1.5]) # Generator for our validation data
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
directory=validation_dir,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='binary')
model = Sequential([
Conv2D(16, 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(),
Conv2D(32, 3, padding='same', activation='relu'),
MaxPooling2D(),
Conv2D(64, 3, padding='same', activation='relu'),
MaxPooling2D(),
Flatten(),
Dense(512, activation='relu'),
Dense(1)
])
lr = 0.01
adam = Adam(lr)
# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
optimizer = self.model.optimizer
lr = K.eval(optimizer.lr)
Epoch_count = epoch + 1
print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))
printlr = printlearningrate()
def scheduler(epoch):
optimizer = model.optimizer
return K.eval(optimizer.lr + 0.01)
updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.compile(optimizer=adam,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=total_val // batch_size,
callbacks = [printlr,updatelr])
Output -
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Epoch 1/5
15/15 [==============================] - ETA: 0s - loss: 40.9353 - accuracy: 0.5156
Epoch: 1 , LR: 0.02
15/15 [==============================] - 27s 2s/step - loss: 40.9353 - accuracy: 0.5156 - val_loss: 0.6938 - val_accuracy: 0.5067 - lr: 0.0200
Epoch 2/5
15/15 [==============================] - ETA: 0s - loss: 0.6933 - accuracy: 0.5021
Epoch: 2 , LR: 0.03
15/15 [==============================] - 27s 2s/step - loss: 0.6933 - accuracy: 0.5021 - val_loss: 0.6935 - val_accuracy: 0.4877 - lr: 0.0300
Epoch 3/5
15/15 [==============================] - ETA: 0s - loss: 0.6932 - accuracy: 0.4989
Epoch: 3 , LR: 0.04
15/15 [==============================] - 27s 2s/step - loss: 0.6932 - accuracy: 0.4989 - val_loss: 0.6933 - val_accuracy: 0.5056 - lr: 0.0400
Epoch 4/5
15/15 [==============================] - ETA: 0s - loss: 0.6932 - accuracy: 0.4947
Epoch: 4 , LR: 0.05
15/15 [==============================] - 27s 2s/step - loss: 0.6932 - accuracy: 0.4947 - val_loss: 0.6931 - val_accuracy: 0.4967 - lr: 0.0500
Epoch 5/5
15/15 [==============================] - ETA: 0s - loss: 0.6935 - accuracy: 0.5091
Epoch: 5 , LR: 0.06
15/15 [==============================] - 27s 2s/step - loss: 0.6935 - accuracy: 0.5091 - val_loss: 0.6935 - val_accuracy: 0.4978 - lr: 0.0600

Train accuracy improving but validation remains unchanged?

I am using TF 2.0. I was trying to train a network on my own data. It was not going well. The validation accuracy was close to 0 and stagnant. I tried many regularizations to no effect. Then I tried training a network on 3 classes of data where all images in each class are the same so as to eliminate the possibility of variability. But this is not working either. Since all in-class images are the same, I would expect the validation accuracy to perfectly match the training accuracy since there is no new data. Why is that not the case? Here is my code:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.applications.mobilenet import preprocess_input
import matplotlib.pyplot as plt
base_model = tf.keras.applications.MobileNet(weights='imagenet', include_top=False)
def turn_off(n):
for layer in model.layers[:n]:
layer.trainable = False
for layer in model.layers[n:]:
layer.trainable = True
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(
x) # we add dense layers so that the model can learn more complex functions and classify for better results.
x = Dense(1024, activation='relu')(x) # dense layer 2
x = Dense(512, activation='relu')(x) # dense layer 3
preds = Dense(3, activation='softmax')(x) # final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
turn_off(87)
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rescale=1. / 255,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='training',
shuffle=True) # set as training data
validation_generator = train_datagen.flow_from_directory(
'/users/josh.flori/desktop/colors/',
target_size=(224, 224),
batch_size=32,
color_mode='rgb',
class_mode='categorical',
subset='validation',
shuffle=True) # set as validation data
# Adam optimizer
# loss function will be categorical cross entropy
# evaluation metric will be accuracy
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
validation_data=validation_generator,
validation_steps=validation_generator.samples // train_generator.batch_size,
epochs=6)
Here is the training output
9/9 [==============================] - 19s 2s/step - loss: 0.2645 - accuracy: 0.9134 - val_loss: 1.6668 - val_accuracy: 0.3438
Epoch 2/6
9/9 [==============================] - 20s 2s/step - loss: 0.0417 - accuracy: 0.9567 - val_loss: 2.6176 - val_accuracy: 0.3438
Epoch 3/6
9/9 [==============================] - 17s 2s/step - loss: 0.4771 - accuracy: 0.9422 - val_loss: 4.0694 - val_accuracy: 0.3438
Epoch 4/6
9/9 [==============================] - 18s 2s/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 2.1304 - val_accuracy: 0.3125
Epoch 5/6
9/9 [==============================] - 18s 2s/step - loss: 9.7658e-07 - accuracy: 1.0000 - val_loss: 3.1633 - val_accuracy: 0.3125
Epoch 6/6
9/9 [==============================] - 18s 2s/step - loss: 2.2571e-05 - accuracy: 1.0000 - val_loss: 3.4949 - val_accuracy: 0.3125
My image folders look like this
where there are exactly 128 identical images per folder.
I've been reading all day, trying different images, I can't seem to get anywhere. What is causing this particular behavior? It has to be something obvious but I'm not sure.