TensorFlow BinaryCrossentropy loss quickly reaches NaN - tensorflow

TL;DR - ML model loss, when retrained with new data, reaches NaN quickly. All of the "standard" solutions don't work.
Hello,
Recently, I (successfully) trained a CNN/dense-layered model to be able to classify spectrograms (image representations of audio.) I wanted to try training this model again with new data and made sure that it was the correct dimensions, etc.
However, for some reason, the BinaryCrossentropy loss function steadily declines until around 1.000 and suddenly becomes "NaN" within the first epoch. I have tried lowering the learning rate to 1e-8, am using ReLu throughout and sigmoid for the last layer, but nothing seems to be working. Even simplifying the network to only dense layers, this problem still happens. While I have manually normalized my data, I am pretty confident I did it right so that all of my data falls between [0, 1]. There might be a hole here, but I think that is unlikely.
I attached my code for the model architecture here:
input_shape = (125, 128, 1)
model = models.Sequential([
layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001), input_shape=input_shape),
layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
layers.BatchNormalization(),
layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
layers.BatchNormalization(),
layers.Conv2D(16, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
layers.BatchNormalization(),
layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
layers.BatchNormalization(),
layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
layers.BatchNormalization(),
layers.Conv2D(16, (2, 2), activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
layers.BatchNormalization(),
layers.Dropout(0.3),
layers.Flatten(),
layers.Dense(512, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
layers.Dropout(0.5),
layers.Dense(1, activation='sigmoid')
])
Interestingly though, I tried using this new data to fine-tune a VGG16 model, and it worked! (there is no loss NaN problem.) I've attached that code here, but I genuinely have no idea where/if there is any difference causing the problem:
base_model = keras.applications.VGG16(
weights="imagenet",
input_shape=(125, 128, 3),
include_top=False,
)
# Freeze the base_model
base_model.trainable = False
# Create new model on top
inputs = keras.Input(shape=(125, 128, 3))
x = inputs
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = keras.layers.Dropout(0.5)(x) # Regularize with dropout
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)
model.summary()
I think I've been through all of the "book" solutions, and still can't seem to find the source of the problem. Any help would be much appreciated.

Turns out it was an issue with some of my input data (divide by zero error during normalization.) Sorry for all the trouble and thanks for your help.

Remove all kernel_regularizers, BatchNormalization and dropout layer from Convolution layers which are not required. Keep kernel_regularizers and Dropout only in Dense layers in your model definition as well as change the number of kernels in Conv2d layer.
and try again training your model using below code:
model = Sequential([
Rescaling(1./255, input_shape=(img_h,img_w,3)),
Conv2D(16, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
#BatchNormalization(),
Conv2D(16, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
#BatchNormalization(),
Conv2D(32, (3, 3), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
#BatchNormalization(),
Conv2D(32, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
#BatchNormalization(),
Conv2D(64, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((3, 3), strides=(2, 2), padding='same'),
#BatchNormalization(),
Conv2D(64, (2, 2), activation='relu'),#, kernel_regularizer=regularizers.l2(0.001)),
MaxPooling2D((2, 2), strides=(1, 1), padding='same'),
#BatchNormalization(),
#Dropout(0.3),
Flatten(),
Dense(512, activation='relu'), kernel_regularizer=regularizers.l2(0.001)),
Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
Dropout(0.5),
Dense(1, activation='sigmoid')
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer,loss='binary_crossentropy',metrics=['accuracy'])
model.fit(...
Output:
Epoch 1/20
63/63 [==============================] - 9s 97ms/step - loss: 1.0032 - accuracy: 0.5035 - val_loss: 0.8219 - val_accuracy: 0.6160
Epoch 2/20
63/63 [==============================] - 6s 88ms/step - loss: 0.7575 - accuracy: 0.5755 - val_loss: 0.7256 - val_accuracy: 0.6120
Epoch 3/20
63/63 [==============================] - 6s 88ms/step - loss: 0.7181 - accuracy: 0.5805 - val_loss: 0.6917 - val_accuracy: 0.6360
Epoch 4/20
63/63 [==============================] - 6s 88ms/step - loss: 0.6749 - accuracy: 0.6190 - val_loss: 0.6671 - val_accuracy: 0.6300
Epoch 5/20
63/63 [==============================] - 6s 95ms/step - loss: 0.6571 - accuracy: 0.6500 - val_loss: 0.6850 - val_accuracy: 0.5980
Epoch 6/20
63/63 [==============================] - 5s 80ms/step - loss: 0.6319 - accuracy: 0.6720 - val_loss: 0.6243 - val_accuracy: 0.6730
Epoch 7/20
63/63 [==============================] - 6s 90ms/step - loss: 0.5923 - accuracy: 0.6935 - val_loss: 0.6144 - val_accuracy: 0.7120
Epoch 8/20
63/63 [==============================] - 6s 89ms/step - loss: 0.5643 - accuracy: 0.7205 - val_loss: 0.6136 - val_accuracy: 0.6700
Epoch 9/20
63/63 [==============================] - 6s 93ms/step - loss: 0.5552 - accuracy: 0.7380 - val_loss: 0.5669 - val_accuracy: 0.7080
Epoch 10/20
63/63 [==============================] - 4s 58ms/step - loss: 0.5423 - accuracy: 0.7400 - val_loss: 0.5819 - val_accuracy: 0.7120
Epoch 11/20
63/63 [==============================] - 4s 57ms/step - loss: 0.4905 - accuracy: 0.7745 - val_loss: 0.6146 - val_accuracy: 0.7020
Epoch 12/20
63/63 [==============================] - 4s 57ms/step - loss: 0.4808 - accuracy: 0.7900 - val_loss: 0.6318 - val_accuracy: 0.7070
Epoch 13/20
63/63 [==============================] - 4s 60ms/step - loss: 0.4602 - accuracy: 0.7990 - val_loss: 0.5707 - val_accuracy: 0.7160
Epoch 14/20
63/63 [==============================] - 4s 61ms/step - loss: 0.4291 - accuracy: 0.8190 - val_loss: 0.6392 - val_accuracy: 0.6910
Epoch 15/20
63/63 [==============================] - 5s 69ms/step - loss: 0.4003 - accuracy: 0.8355 - val_loss: 0.7048 - val_accuracy: 0.7110
Epoch 16/20
63/63 [==============================] - 4s 58ms/step - loss: 0.3658 - accuracy: 0.8430 - val_loss: 0.8027 - val_accuracy: 0.7180
Epoch 17/20
63/63 [==============================] - 4s 58ms/step - loss: 0.3069 - accuracy: 0.8750 - val_loss: 0.9428 - val_accuracy: 0.6970
Epoch 18/20
63/63 [==============================] - 4s 59ms/step - loss: 0.2601 - accuracy: 0.9005 - val_loss: 0.9420 - val_accuracy: 0.7170
Epoch 19/20
63/63 [==============================] - 4s 60ms/step - loss: 0.2061 - accuracy: 0.9230 - val_loss: 0.9134 - val_accuracy: 0.7290
Epoch 20/20
63/63 [==============================] - 4s 62ms/step - loss: 0.1770 - accuracy: 0.9330 - val_loss: 1.0805 - val_accuracy: 0.6930

Related

TensorFlow 2.4: loss: 0.0000e+00 but accuracy: 0.2682 only, does that make sense?

I am strugling to understand how the loss (sparse_categorical_crossentropy) can be zero, but the accuracy << 1? Also the "trained" model does not provide good results. How can it be that the loss gives a zero, despite the model not being well trained:
Epoch 1/3
182/182 [==============================] - 496s 3s/step - loss: 0.0000e+00 - accuracy: 0.2682 - val_loss: 0.0000e+00 - val_accuracy: 0.2729
Epoch 2/3
147/182 [=======================>......] - ETA: 1:29 - loss: 0.0000e+00 - accuracy: 0.2645
The model:
number_of_categories = len(class_names)
loss = 'sparse_categorical_crossentropy'
metrics = ['accuracy']
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(30, (5, 5), activation=activation, input_shape=(pheight, pwidth, 3)))
model.add(tf.keras.layers.MaxPooling2D((5, 5)))
model.add(tf.keras.layers.Conv2D(40, (5, 5), activation=activation))
model.add(tf.keras.layers.MaxPooling2D((5, 5)))
model.add(tf.keras.layers.Conv2D(50, (5, 5), activation=activation))
model.add(tf.keras.layers.MaxPooling2D((5, 5)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64, activation=activation))
model.add(tf.keras.layers.Dense(number_of_categories, activation='softmax'))
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
Any thoughts of what is going wrong here are highly welcome ...
And this is the model fit:
history = model.fit(x=training_generator,
validation_data=test_generator,
batch_size=batch_size,
use_multiprocessing=False,
workers=1,
epochs=epochs,
steps_per_epoch=len(training_generator),
max_queue_size=1)
Problem solved: I had accidentally used small float number likes 0.01 instead 1 as ground truth values (y_true). This did cause the wrong behaviour. By scalling it up and rounding to 1 instead it now works.

Tricks to improve CNN model performance

I am fitting a large CNN network on my training data, validating on 20%. It appears the model performs better on the training than the validation set. What do you suggest so I can improve the model performance.
CNN Architecture:
model = Sequential()
activ = 'relu'
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ, input_shape=(1, 100, 4)))
model.add(Conv2D(32, (1, 3), strides=(1, 1), padding='same', activation=activ))
#model.add(BatchNormalization(axis = 3))
model.add(MaxPooling2D(pool_size=(1, 2) ))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(64, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(MaxPooling2D(pool_size=(1, 2)))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ))
model.add(Conv2D(128, (1, 3), strides=(1, 1), padding='same', activation=activ ))
model.add(MaxPooling2D(pool_size=(1, 2)))
model.add(Dropout(.5))
model.add(Flatten())
A = model.output_shape
model.add(Dense(int(A[1] * 1/4.), activation=activ))
model.add(Dropout(.5))
model.add(Dense(5, activation='softmax'))
optimizer = Adam(lr=0.003, beta_1=0.9, beta_2=0.999, epsilon=1e-04, decay=0.0)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=100, batch_size=64, shuffle=False,
validation_split=0.2)
However, the validation accuracy doesn't change for the number of epochs.
Epoch 1/100
1065/1065 [==============================] - 14s 13ms/step - loss: 1.4174 - accuracy: 0.5945 - val_loss: 1.4966 - val_accuracy: 0.4417
Epoch 2/100
1065/1065 [==============================] - 14s 13ms/step - loss: 1.1494 - accuracy: 0.6207 - val_loss: 1.4634 - val_accuracy: 0.4417
Epoch 3/100
1065/1065 [==============================] - 19s 18ms/step - loss: 1.1111 - accuracy: 0.6196 - val_loss: 1.4674 - val_accuracy: 0.4417
Epoch 4/100
1065/1065 [==============================] - 15s 14ms/step - loss: 1.1040 - accuracy: 0.6196 - val_loss: 1.4660 - val_accuracy: 0.4417
Epoch 5/100
1065/1065 [==============================] - 18s 17ms/step - loss: 1.1027 - accuracy: 0.6196 - val_loss: 1.4624 - val_accuracy: 0.4417
NOTE: I AdamĀ“s default learning rate 0.001 as well as 0.003 but the output is the same (log).
Your model is working but improving very slowly. I would reduce the dropout value down to .1 initially, then run the model and see if it overfits or not.If it does then slowly increase the dropout rate. Unless your data is already shuffled I would set shuffle=True in model.fit. Also you might try replacing the Flatten layer with a GlobalMaxPooling layer. I also recommend using the EarlyStopping callback which monitors validation and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. Also use the ReduceLROnPlateau to automatically adjust the learning rate based on validation loss.
The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
In model.fit set callbacks=callbacks. Increase the number of epochs you run to say 100 so that the early stopping callback triggers.

tensorflow accuracy, val_accuracy remains the same while training

I build a CNN based on the Chest X-Ray Images (Pneumonia) dataset and for some reason when I train the model I get the same accuracy and val_accuracy over epochs.
train_ds = ImageDataGenerator()
traindata = train_ds.flow_from_directory(directory="../input/chest-xray-pneumonia/chest_xray/train",target_size=(IMG_HEIGHT,IMG_WIDTH),shuffle=True)
// Found 5216 images belonging to 2 classes.
test_ds = ImageDataGenerator()
testdata = test_ds.flow_from_directory(directory="../input/chest-xray-pneumonia/chest_xray/test",target_size=(IMG_HEIGHT,IMG_WIDTH),shuffle=True)
//Found 624 images belonging to 2 classes.
model = keras.Sequential([
keras.layers.Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"),
keras.layers.Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Flatten(),
keras.layers.Dense(units=4096,activation="relu"),
# keras.layers.Dropout(.5),
keras.layers.Dense(units=4096,activation="relu"),
# keras.layers.Dropout(.5),
keras.layers.Dense(units=2, activation="softmax"),
])
opt = keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=opt,
loss="categorical_crossentropy",
metrics=['accuracy'])
logdir = "logs\\training\\" + datetime.now().strftime("%Y%m%d-%H%M%S")
checkpoint = keras.callbacks.ModelCheckpoint("vgg16_1.h5", verbose=1, monitor='val_accuracy', save_best_only=True, mode='auto')
early = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
hist = model.fit(traindata,
steps_per_epoch=STEPS_PER_EPOCH,
epochs=100,
validation_data=testdata,
validation_steps=VALIDATION_STEPS,
callbacks=[early, tensorboard_callback])
Epoch 1/100
163/163 [==============================] - 172s 1s/step - loss: 62.6885 - accuracy: 0.7375 - val_loss: 0.6827 - val_accuracy: 0.6250
Epoch 2/100
163/163 [==============================] - 157s 961ms/step - loss: 0.5720 - accuracy: 0.7429 - val_loss: 0.7133 - val_accuracy: 0.6250
Epoch 3/100
163/163 [==============================] - 159s 975ms/step - loss: 0.5725 - accuracy: 0.7429 - val_loss: 0.6691 - val_accuracy: 0.6250
Epoch 4/100
163/163 [==============================] - 159s 973ms/step - loss: 0.5721 - accuracy: 0.7429 - val_loss: 0.7036 - val_accuracy: 0.6250
Epoch 5/100
163/163 [==============================] - 158s 971ms/step - loss: 0.5715 - accuracy: 0.7429 - val_loss: 0.7169 - val_accuracy: 0.6250
Epoch 6/100
163/163 [==============================] - 160s 983ms/step - loss: 0.5718 - accuracy: 0.7429 - val_loss: 0.6982 - val_accuracy: 0.6250
I've tried changing the activation function for the last layer, adding dropout layers and toyed around with the number of neurons but nothing seemed to work. does anyone have an ideas what causes this strange behaviour?
You only have a small dataset. From the looks of your training loss, I would suppose that your network actually already converged after 1 epoch. (with a substantial amount of overfit)
For the amount of data you have, I would suggest trying a way smaller network, or work with data augmentation techniques to regularize your model.

Convolutional Neural Network seems to be randomly guessing

So I am currently trying to build a race recognition program using a convolution neural network. I'm inputting 200px by 200px versions of the UTKFaceRegonition dataset (put my dataset on a google drive if you want to take a look). Im using 8 different classes (4 races * 2 genders) using keras and tensorflow, each having about 700 images but I have done it with 1000. The problem is when I run the network it gets at best 13.5% accuracy and about 11-12.5% validation accuracy, with a loss around 2.079-2.081, even after 50 epochs or so it won't improve at all. My current hypothesis is that it is randomly guessing stuff/not learning because 8/100=12.5%, which is about what it is getting and on other models I have made with 3 classes it was getting about 33%
I noticed the validation accuracy is different on the first and sometimes second epoch, but after that it ends up staying constant. I've increased the pixel resolution, changed amount of layers, types of layer and neurons per layer, I've tried optimizers (sgd at the normal lr and at very large and small (.1 and 10^-6) and I've tried different loss functions like KLDivergence but nothing seems to have any effect on it except KLDivergence which on one run did pretty well (about 16%) but then it flopped again. Some ideas I had are maybe theres too much noise in the dataset or maybe it has to do with the amount of dense layers, but honestly I dont know why it is not learning.
Heres the code to make the tensors
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import os
import cv2
import random
import pickle
WIDTH_SIZE = 200
HEIGHT_SIZE = 200
CATEGORIES = []
for CATEGORY in os.listdir('./TRAINING'):
CATEGORIES.append(CATEGORY)
DATADIR = "./TRAINING"
training_data = []
def create_training_data():
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path)[:700]:
try:
img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_COLOR)
new_array = cv2.resize(img_array,(WIDTH_SIZE,HEIGHT_SIZE))
training_data.append([new_array,class_num])
except Exception as error:
print(error)
create_training_data()
random.shuffle(training_data)
X = []
y = []
for features, label in training_data:
X.append(features)
y.append(label)
X = np.array(X).reshape(-1, WIDTH_SIZE, HEIGHT_SIZE, 3)
y = np.array(y)
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out = open("y.pickle", "wb")
pickle.dump(y, pickle_out)
Heres my built model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
import pickle
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0
model = Sequential()
model.add(Conv2D(256, (2,2), activation = 'relu', input_shape = X.shape[1:]))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Conv2D(256, (2,2), activation = 'relu'))
model.add(Dropout(0.4))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(8, activation="softmax"))
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
model.fit(X, y, batch_size=16,epochs=100,validation_split=.1)
Heres a log of 10 epochs I ran.
5040/5040 [==============================] - 55s 11ms/sample - loss: 2.0803 - accuracy: 0.1226 - val_loss: 2.0796 - val_accuracy: 0.1250
Epoch 2/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1147 - val_loss: 2.0798 - val_accuracy: 0.1161
Epoch 3/100
5040/5040 [==============================] - 53s 10ms/sample - loss: 2.0797 - accuracy: 0.1190 - val_loss: 2.0800 - val_accuracy: 0.1161
Epoch 4/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1173 - val_loss: 2.0799 - val_accuracy: 0.1107
Epoch 5/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1183 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 6/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1226 - val_loss: 2.0801 - val_accuracy: 0.1107
Epoch 7/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1238 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 8/100
5040/5040 [==============================] - 54s 11ms/sample - loss: 2.0797 - accuracy: 0.1169 - val_loss: 2.0802 - val_accuracy: 0.1107
Epoch 9/100
5040/5040 [==============================] - 52s 10ms/sample - loss: 2.0797 - accuracy: 0.1212 - val_loss: 2.0803 - val_accuracy: 0.1107
Epoch 10/100
5040/5040 [==============================] - 53s 11ms/sample - loss: 2.0797 - accuracy: 0.1177 - val_loss: 2.0802 - val_accuracy: 0.1107
So yeah, any help on why my network seems to be just guessing? Thank you!
The problem lies in the design of you network.
Typically you'd want in the first layers to learn high-level features and use larger kernel with odd size. Currently you're essentially interpolating neighbouring pixels. Why odd size? Read e.g. here.
Number of filters typically increases from small (e.g. 16, 32) number to larger values when going deeper into the network. In your network all layers learn the same number of filters. The reasoning is that the deeper you go, the more fine-grained features you'd like to learn - hence increase in number of filters.
In your ANN each layer also cuts out valuable information from the image (by default you are using valid padding).
Here's a very basic network that gets me after 40 seconds and 10 epochs over 95% training accuracy:
import pickle
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)
X = X/255.0
model = Sequential()
model.add(Conv2D(16, (5,5), activation = 'relu', input_shape = X.shape[1:], padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(32, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3), activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(512))
model.add(Dense(8, activation='softmax'))
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),metrics=['accuracy'])
Architecture:
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_19 (Conv2D) (None, 200, 200, 16) 1216
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 100, 100, 16) 0
_________________________________________________________________
conv2d_20 (Conv2D) (None, 100, 100, 32) 4640
_________________________________________________________________
max_pooling2d_15 (MaxPooling (None, 50, 50, 32) 0
_________________________________________________________________
conv2d_21 (Conv2D) (None, 50, 50, 64) 18496
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 25, 25, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 40000) 0
_________________________________________________________________
dense_7 (Dense) (None, 512) 20480512
_________________________________________________________________
dense_8 (Dense) (None, 8) 4104
=================================================================
Total params: 20,508,968
Trainable params: 20,508,968
Non-trainable params: 0
Training:
Train on 5040 samples, validate on 560 samples
Epoch 1/10
5040/5040 [==============================] - 7s 1ms/sample - loss: 2.2725 - accuracy: 0.1897 - val_loss: 1.8939 - val_accuracy: 0.2946
Epoch 2/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.7831 - accuracy: 0.3375 - val_loss: 1.8658 - val_accuracy: 0.3179
Epoch 3/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.4857 - accuracy: 0.4623 - val_loss: 1.9507 - val_accuracy: 0.3357
Epoch 4/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 1.1294 - accuracy: 0.6028 - val_loss: 2.1745 - val_accuracy: 0.3250
Epoch 5/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.8060 - accuracy: 0.7179 - val_loss: 3.1622 - val_accuracy: 0.3000
Epoch 6/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.5574 - accuracy: 0.8169 - val_loss: 3.7494 - val_accuracy: 0.2839
Epoch 7/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3756 - accuracy: 0.8813 - val_loss: 4.9125 - val_accuracy: 0.2643
Epoch 8/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.3001 - accuracy: 0.9036 - val_loss: 5.6300 - val_accuracy: 0.2821
Epoch 9/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.2345 - accuracy: 0.9337 - val_loss: 5.7263 - val_accuracy: 0.2679
Epoch 10/10
5040/5040 [==============================] - 6s 1ms/sample - loss: 0.1549 - accuracy: 0.9581 - val_loss: 7.3682 - val_accuracy: 0.2732
As you can see, validation score is terrible, but the point was to demonstrate that poor architecture can prevent training altogether.

input_shape with image_generator in Tensorflow

I'm trying to use this approach in Tensorflow 2.X to load large dataset that does not fit in memory.
I have a folder with X sub-folders that contains images. Each sub-folder is a class.
\dataset
-\class1
-img1_1.jpg
-img1_2.jpg
-...
-\classe2
-img2_1.jpg
-img2_2.jpg
-...
I create my data generator from my folder like this:
train_data_gen = image_generator.flow_from_directory(directory="path\\to\\dataset",
batch_size=100,
shuffle=True,
target_size=(100, 100), # Image H x W
classes=list(CLASS_NAMES)) # list of folder/class names ["class1", "class2", ...., "classX"]
Found 629 images belonging to 2 classes.
I've did a smaller dataset to test the pipeline. Only 629 images in 2 classes.
Now I can create a dummy model like this:
model = tf.keras.Sequential()
model.add(Dense(1, activation=activation, input_shape=(100, 100, 3))) # only 1 layer of 1 neuron
model.add(Dense(2)) # 2classes
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['categorical_accuracy'])
Once compile I try to fit this dummy model:
STEPS_PER_EPOCH = np.ceil(image_count / batch_size) # 629 / 100
model.fit_generator(generator=train_data_gen , steps_per_epoch=STEPS_PER_EPOCH, epochs=2, verbose=1)
1/7 [===>..........................] - ETA: 2s - loss: 1.1921e-07 - categorical_accuracy: 0.9948
2/7 [=======>......................] - ETA: 1s - loss: 1.1921e-07 - categorical_accuracy: 0.5124
3/7 [===========>..................] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.3449
4/7 [================>.............] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.2662
5/7 [====================>.........] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.2130
6/7 [========================>.....] - ETA: 0s - loss: 1.1921e-07 - categorical_accuracy: 0.1808
2020-04-14 20:39:48.629203: W tensorflow/core/framework/op_kernel.cc:1610] Invalid argument: ValueError: generator yielded an element of shape (29, 100, 100, 3) where an element of shape (100, 100, 100, 3) was expected.
From what i understand, the last batch doesn't has the same shape has the previous batches. So it crashes. I've tried to specify a batch_input_shape.
model.add(Dense(1, activation=activation, batch_input_shape=(None, 100, 100, 3)))
I've found here that I should put None to not specify the number of elements in the batch so it can be dynamic. But no success.
Edit: From the comment I had 2 mistakes:
The output shape was bad. I missed the flatten layer in the model.
The previous link does work with the correction of the flatten layer
Missing some code, I actually feed the fit_generator with a tf.data.Dataset.from_generator but I gave here a image_generator.flow_from_directory.
Here is the final code:
train_data_gen = image_generator.flow_from_directory(directory="path\\to\\dataset",
batch_size=1000,
shuffle=True,
target_size=(100, 100),
classes=list(CLASS_NAMES))
train_dataset = tf.data.Dataset.from_generator(
lambda: train_data_gen,
output_types=(tf.float32, tf.float32),
output_shapes=([None, x, y, 3],
[None, len(CLASS_NAMES)]))
model = tf.keras.Sequential()
model.add(Flatten(batch_input_shape=(None, 100, 100, 3)))
model.add(Dense(1, activation=activation))
model.add(Dense(2))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['categorical_accuracy'])
STEPS_PER_EPOCH = np.ceil(image_count / batch_size) # 629 / 100
model.fit_generator(generator=train_data_gen , steps_per_epoch=STEPS_PER_EPOCH, epochs=2, verbose=1)
For the benefit of community here i am explaining, how to use image_generator in Tensorflow with input_shape (100, 100, 3) using dogs vs cats dataset
If we haven't choose right batch size there is a chance of model struck right after first epoch, hence i am starting my explanation with how to choose batch_size ?
We generally observe that batch size to be the power of 2, this is because of the effective work of optimized matrix operation libraries. This is further elaborated in this research paper.
Check out this blog which describes how to choose the right batch size while comparing the effects of different batch sizes on the accuracy of CIFAR-10 dataset.
Here is the end to end working code with outputs
import os
import numpy as np
from keras import layers
import pandas as pd
from tensorflow.keras.layers import Input, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D
from tensorflow.keras.layers import AveragePooling2D, MaxPooling2D, Dropout, GlobalMaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.models import Sequential
from tensorflow.keras import regularizers, optimizers
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import keras.backend as K
K.set_image_data_format('channels_last')
train_dir = '/content/drive/My Drive/Dogs_Vs_Cats/train'
test_dir = '/content/drive/My Drive/Dogs_Vs_Cats/test'
img_width, img_height = 100, 100
input_shape = img_width, img_height, 3
train_samples = 2000
test_samples = 1000
epochs = 30
batch_size = 32
train_datagen = ImageDataGenerator(
rescale = 1. /255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(
rescale = 1. /255)
train_data = train_datagen.flow_from_directory(
train_dir,
target_size = (img_width, img_height),
batch_size = batch_size,
class_mode = 'binary')
test_data = test_datagen.flow_from_directory(
test_dir,
target_size = (img_width, img_height),
batch_size = batch_size,
class_mode = 'binary')
model = Sequential()
model.add(Conv2D(32, (7, 7), strides = (1, 1), input_shape = input_shape))
model.add(BatchNormalization(axis = 3))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (7, 7), strides = (1, 1)))
model.add(BatchNormalization(axis = 3))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss = 'binary_crossentropy',
optimizer = 'rmsprop',
metrics = ['accuracy'])
model.fit_generator(
train_data,
steps_per_epoch = train_samples//batch_size,
epochs = epochs,
validation_data = test_data,
verbose = 1,
validation_steps = test_samples//batch_size)
Output:
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_8 (Conv2D) (None, 94, 94, 32) 4736
_________________________________________________________________
batch_normalization_8 (Batch (None, 94, 94, 32) 128
_________________________________________________________________
activation_8 (Activation) (None, 94, 94, 32) 0
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 47, 47, 32) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 41, 41, 64) 100416
_________________________________________________________________
batch_normalization_9 (Batch (None, 41, 41, 64) 256
_________________________________________________________________
activation_9 (Activation) (None, 41, 41, 64) 0
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 20, 20, 64) 0
_________________________________________________________________
flatten_4 (Flatten) (None, 25600) 0
_________________________________________________________________
dense_11 (Dense) (None, 64) 1638464
_________________________________________________________________
dropout_4 (Dropout) (None, 64) 0
_________________________________________________________________
dense_12 (Dense) (None, 1) 65
=================================================================
Total params: 1,744,065
Trainable params: 1,743,873
Non-trainable params: 192
_________________________________________________________________
Epoch 1/30
62/62 [==============================] - 14s 225ms/step - loss: 1.8307 - accuracy: 0.4853 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 2/30
62/62 [==============================] - 14s 226ms/step - loss: 0.7085 - accuracy: 0.4832 - val_loss: 0.6931 - val_accuracy: 0.5010
Epoch 3/30
62/62 [==============================] - 14s 218ms/step - loss: 0.6955 - accuracy: 0.5300 - val_loss: 0.6894 - val_accuracy: 0.5292
Epoch 4/30
62/62 [==============================] - 14s 221ms/step - loss: 0.6938 - accuracy: 0.5407 - val_loss: 0.7309 - val_accuracy: 0.5262
Epoch 5/30
62/62 [==============================] - 14s 218ms/step - loss: 0.6860 - accuracy: 0.5498 - val_loss: 0.6776 - val_accuracy: 0.5665
Epoch 6/30
62/62 [==============================] - 13s 216ms/step - loss: 0.7027 - accuracy: 0.5407 - val_loss: 0.6895 - val_accuracy: 0.5101
Epoch 7/30
62/62 [==============================] - 13s 216ms/step - loss: 0.6852 - accuracy: 0.5528 - val_loss: 0.6567 - val_accuracy: 0.5887
Epoch 8/30
62/62 [==============================] - 13s 217ms/step - loss: 0.6772 - accuracy: 0.5427 - val_loss: 0.6643 - val_accuracy: 0.5847
Epoch 9/30
62/62 [==============================] - 13s 217ms/step - loss: 0.6709 - accuracy: 0.5534 - val_loss: 0.6623 - val_accuracy: 0.5887
Epoch 10/30
62/62 [==============================] - 14s 219ms/step - loss: 0.6579 - accuracy: 0.5711 - val_loss: 0.6614 - val_accuracy: 0.6058
Epoch 11/30
62/62 [==============================] - 13s 218ms/step - loss: 0.6591 - accuracy: 0.5625 - val_loss: 0.6594 - val_accuracy: 0.5454
Epoch 12/30
62/62 [==============================] - 13s 216ms/step - loss: 0.6419 - accuracy: 0.5767 - val_loss: 1.1041 - val_accuracy: 0.5161
Epoch 13/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6479 - accuracy: 0.5783 - val_loss: 0.6441 - val_accuracy: 0.5837
Epoch 14/30
62/62 [==============================] - 13s 216ms/step - loss: 0.6373 - accuracy: 0.5899 - val_loss: 0.6427 - val_accuracy: 0.6310
Epoch 15/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6203 - accuracy: 0.6133 - val_loss: 0.7390 - val_accuracy: 0.6220
Epoch 16/30
62/62 [==============================] - 13s 217ms/step - loss: 0.6277 - accuracy: 0.6362 - val_loss: 0.6649 - val_accuracy: 0.5786
Epoch 17/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6155 - accuracy: 0.6316 - val_loss: 0.9823 - val_accuracy: 0.5484
Epoch 18/30
62/62 [==============================] - 14s 222ms/step - loss: 0.6056 - accuracy: 0.6408 - val_loss: 0.6333 - val_accuracy: 0.6048
Epoch 19/30
62/62 [==============================] - 14s 218ms/step - loss: 0.6025 - accuracy: 0.6529 - val_loss: 0.6514 - val_accuracy: 0.6442
Epoch 20/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6149 - accuracy: 0.6423 - val_loss: 0.6373 - val_accuracy: 0.6048
Epoch 21/30
62/62 [==============================] - 13s 215ms/step - loss: 0.6030 - accuracy: 0.6519 - val_loss: 0.6086 - val_accuracy: 0.6573
Epoch 22/30
62/62 [==============================] - 13s 217ms/step - loss: 0.5936 - accuracy: 0.6865 - val_loss: 1.0677 - val_accuracy: 0.5605
Epoch 23/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5964 - accuracy: 0.6728 - val_loss: 0.7927 - val_accuracy: 0.5877
Epoch 24/30
62/62 [==============================] - 13s 215ms/step - loss: 0.5866 - accuracy: 0.6707 - val_loss: 0.6116 - val_accuracy: 0.6421
Epoch 25/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5933 - accuracy: 0.6662 - val_loss: 0.8282 - val_accuracy: 0.6048
Epoch 26/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5705 - accuracy: 0.6885 - val_loss: 0.5806 - val_accuracy: 0.6966
Epoch 27/30
62/62 [==============================] - 14s 218ms/step - loss: 0.5709 - accuracy: 0.7017 - val_loss: 1.2404 - val_accuracy: 0.5333
Epoch 28/30
62/62 [==============================] - 13s 216ms/step - loss: 0.5691 - accuracy: 0.7104 - val_loss: 0.6136 - val_accuracy: 0.6442
Epoch 29/30
62/62 [==============================] - 13s 215ms/step - loss: 0.5627 - accuracy: 0.7048 - val_loss: 0.6936 - val_accuracy: 0.6613
Epoch 30/30
62/62 [==============================] - 13s 214ms/step - loss: 0.5714 - accuracy: 0.6941 - val_loss: 0.5872 - val_accuracy: 0.6825