Training and Loss not changing in Keras CNN model - tensorflow

I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?
Just some interesting points to note:
I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.
I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.
I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.
imageWidth = 50
imageHeight = 150
def get_filepaths(directory):
file_paths = []
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths
def cleanUpPaths(fullFilePaths):
cleanPaths = []
for f in fullFilePaths:
if f.endswith(".png"):
cleanPaths.append(f)
return cleanPaths
def getTrainData(paths):
trainData = []
for i in xrange(1,190000,2):
im = image.imread(paths[i])
im = image.imresize(im, (150,50))
im = (im-255)/float(255)
trainData.append(im)
trainData = np.asarray(trainData)
right = np.zeros(47500)
left = np.ones(47500)
trainLabels = np.concatenate((left, right))
trainLabels = np_utils.to_categorical(trainLabels)
return (trainData, trainLabels)
#create the convnet
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (1, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
#prepare the training data*/
trainPaths = get_filepaths("better1/train")
trainPaths = cleanUpPaths(trainPaths)
(trainData, trainLabels) = getTrainData(trainPaths)
trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
trainData = (trainData-255)/float(255)
#train the convnet***
model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
#/save the model and weights*/
model.save('myConvnet_model5.h5');
model.save_weights('myConvnet_weights5.h5');

I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.
Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*
Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.
Fixes/Checks (in somewhat of an order):
Double Check Model Architecture: use model.summary(), inspect the model.
Check Data Labels: make sure the labelling of your train data hasn't got mixed up somewhere in the preprocessing etc. (it happens!)
Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)
If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.
**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"
*Other names for the issue to help searches get here...
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output

You can try to add a BatchNornmalization() layer after MaxPooling2D(). It works for me.

I just have 2 things more to add to the great list of DBCerigo.
Check activation functions: some layers have linear activation function by default, if you do not insert some non linearity into your model it wont be able to generalize, so the net will try to learn how to separate linearly a feature space that is not linear. Making sure you have your non linearity set is a good checkpoint.
Check Model Complexity: if you have a relatively simple model and it learns only till the 1st or the 2nd epoch and then it stalls, it may be that it is trying to learn something too complex. Try making the model deeper. This usually happens when working with frozen models with only 1 or 2 layers unfrozen.
Although the 2nd one may be obvious, I run into his problem once and I lost lots of time checking everythin (data, batches, LR...) before figuring out.
Hope this helps

I would try a couple of things. A lower learning rate should help with more data. Generally, adapting the optimizer should help. Additionally your network seems really small, you might want to increase the capacity of the model by adding layers or increasing the number of filters in the layers.
A better description on how to apply deep learning in practice is given here.

in my case it is the activification function matters. I change from 'sgd' to 'a'

Related

Is it ok that my deep learning model shows nearly 100% accuracy and validation accuracy after one epoch?

I am trying to make a model for the signature verification problem, so the dataset contains nearly 800 (already augmented) samples. I am assuming that this is the core of the problem I am getting.
Just to clarify the weird choice of hyperparameters, I am writing a school report on the effect of CNN configurations on models performance (this is the first model I did).
pls correct me if you notice any misconception in my explanation/code
model = Sequential()
model.add(Conv2D(64, (1,1), input_shape = X.shape[1:] ))
model.add(Activation("relu"))
model.add(Conv2D(64, (1,1)))
model.add(Activation("relu"))
model.add(GlobalMaxPooling2D())
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(1))
model.add(Activation('sigmoid'))
here are the results
1st model: 1x1 kernel size and global pooling
2nd: 3x3 and average pooling
3rd: 5x5 and max-pooling
This can happen when you're dataset are not properly randomized and heterogeneous, maybe have a look there.

Binary classification not training correctly

I've been working on a neural network that can classify two sets of astronomical data. I believe that my neural network is struggling because the two sets of data are quite similar, but even with significant changes to the data, it still seems like the accuracy history doesn't behave how I think it would.
These are example images from each dataset:
I'm currently using 10,000 images of each type, with 20% going to validation data, so 16,000 training images and 4,000 validation images. Due to memory constraints, I can't increase the datasets much more than this.
This is my current model:
model.add(layers.Conv2D(64, (3, 3), padding="valid", activation='relu', input_shape=(192, 192, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (7, 7), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (9, 9), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (7, 7), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(2, activation="sigmoid"))
which I'm compiling with:
opt = SGD(lr=0.1)
model.compile(optimizer=opt,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
and fitting using:
history = model.fit(train, train_labels, batch_size=200, epochs=15, validation_data=(validation, validation_labels))
If I add something to the data to make the datasets unrealistically different (e.g. adding a random rectangle to the middle of the data or by adding a mask to one but not the other), I get an accuracy history that looks like this:
(Note that the accuracy history for the training data was shifted half an epoch to the left to account for the training accuracy being measured, on average, half an epoch before the validation accuracy.)
If I make the datasets very similar (e.g. adding nothing to the datasets or applying the same mask to both), I get an accuracy history that looks like this:
or occasionally with a big spike in validation accuracy for one epoch, like so:
Looking at different websites and other StackOverflow pages, I've tried:
changing the number and size of the filters
adding or subtracting convolutional layers
changing the optimizer function (it was originally "adam", so it had an adaptive learning rate and I switched it to the above so I could manually tune the learning rate)
increasing the batch size
increasing the dataset (originally had only 5,000 images of each instead of 10,000),
increasing the number of epochs (from 10 to 15)
adding or subtracting padding from the convolutional layer
changing the activation function in the last layer
Am I missing something? Are these dataset just too similar to achieve a binary classifcation network?
If this is a binary classification then you need to change:
model.add(layers.Dense(2, activation="sigmoid"))
into:
model.add(layers.Dense(1, activation="sigmoid"))
Sigmoid indicates that if the output is bigger than some threshold(most of the times 0.5) then it belongs to second class etc. Also you really don't need to use from_logits = True since you specified an activation in the last dense layer.
Recall that your loss also should be:
tf.keras.losses.BinaryCrossentropy(from_logits = False)
If you want to set from_logits = True, then your last dense layer should look like this:
model.add(layers.Dense(1)) # no activation, linear.
You can also use 2 neurons in the last dense layer but then you need to use softmax activation with categorical loss.

Stopping training at maximum validation accuracy. Is this a good practice?

I am training my model with a dataset of 200 images. I have created a binary classification CNN that looks like this one:
classifier = Sequential()
# Adding a first convolutional layer
classifier.add(Convolution2D(48, 3, input_shape = (320, 320, 3), activation = 'relu'))
classifier.add(MaxPooling2D())
# Adding a second convolutional layer
classifier.add(Convolution2D(48, 3, activation = 'relu'))
classifier.add(MaxPooling2D())
# Adding a third convolutional layer
classifier.add(Convolution2D(48, 3, activation = 'relu'))
classifier.add(MaxPooling2D())
#Flattening
classifier.add(Flatten())
#Full connected
classifier.add(Dense(256, activation = 'relu'))
#Full connected
classifier.add(Dense(256, activation = 'sigmoid'))
#Dropout
classifier.add(Dropout(0.5))
#Full connected
classifier.add(Dense(1, activation = 'sigmoid'))
# Compiling the CNN
opt = keras.optimizers.Adam(learning_rate=0.001)
classifier.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.summary()
I am also using Image Data Augmentation and Early Stopping based on val_accuracy with a patience of 10.
My results are the following:
Result graph validation accuracy
The best validation accuracy I get is 0.9231 at the 21st epoch. Should I stop the training with a custom callback once I surpass 92% or is it a bad practice?
Would it be a good practice to set a custom callback that stops training
The best practice here is to save the model every time the validation accuracy hits a maximum, but to keep training. Alternatively, you could save a model after each epoch, and choose the best one to use by checking the validation graph (I'd suggest epoch 11 here. After 11 the validation graph is just oscillating, which is mostly noise).
Finally, 200 images is rarely enough to get good results. You want thousands or tens of thousands at least. Even your validation set should have at least 100 images so that even minor changes to the model show smooth changes in the validation curve. You should also consider adding some data augmentation if you aren't doing it already.

(TensorFlow) TimeDistributed layer for image classification

I know that “Time Distributed” layers are used when we have several images that are chronologically ordered to detect movements, actions, directions etc.
However, I work on speech classification using spectrograms. Every speech is transformed into a spectrogram, which will be fed later to a neural network to perform classification. So my database is in the form of 2093 RGB images(100x100x3). For now I have used a CNN and the input is
x_train = np.array(x_train).reshape(2093,100,100, 3)
And every thing works just fine.
But now, I would like to use CNN+BLSTM (similar to the following picture, which is taken from this paper) , which means I am going to need time steps. So, every image should be divided into smaller frames.
The question is, how to prepare the data to do such a thing ?
Assuming that I want to divide every image into 10 frames (time steps). Should I just reshape the data
x_train = np.array(x_train).reshape(2093,10,10,100, 3)
Which works just fine but I'm not sure if it's the right thing , or there is another way to do that ?
This is the model that I'm using
model = tf.keras.Sequential([
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(100,100,3),name="conv1")),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=128, kernel_size=2, padding='same', activation='relu')),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=256, kernel_size=2, padding='same', activation='relu')),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten()),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(200, activation="relu"),
tf.keras.layers.Dense(10, activation= "softmax")
])
By using the previous model, I got 47% on train accuracy and 46% accuracy on validation accuracy, but with using only CNN I got 95% on train and 71% on validation, could anyone give me a hint how to solve this problem ?

How to create a "none of the above" label in a Keras model

I am working a simple image classification. Each object must fit in one of the categories based on its material (aluminum, iron, copper)
There is only one picture for each class, e.g. all aluminum materials don't appear in the same photo along with iron materials for example. The model is working pretty well and the accuracy is great. However I don't know how to handle images that don't fit any of these 3 categories. Let's say I submit a picture of a piece of wood. This obliviously don't fit in any of the 3 categories, but my model seems to "guess" one of them and give one of these random categories a false positive along with a high probability. I understand the result of model.predict() cannot be zero, the ideal scenario. I have tested both softmax and sigmoid activations to no avail. I also tried to create a bogus category called "none" and trained the model with random photos of objects that do not have any of the aforementioned materials. The result was the whole model to become unreliable and lost most of the accuracy I had before.
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=(64, 64, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(classes, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
checkpoint = ModelCheckpoint(filepath='c:/Users/data/models/model-{epoch:02d}-{val_loss:.2f}.hdf5',save_best_only=True)
callbacks_list = [checkpoint]
model.fit(x_train, y_train,
batch_size=75,
epochs=20,
verbose=1,
validation_data=(x_valid, y_valid), callbacks=[checkpoint])
From my understanding, this is almost impossible with supervised learning.
Supervised learning takes some dataset and let the machine learn. However the category that falls under "none" is simply too much. It is almost impossible to cover all other materials under the category of none. Worst is that supervised learning will mostly recognize what had been trained. so when a completely new stuff appears in your testing. it is very likely, the system will ignore or give one of the result based on the given training.
One approach that is more suitable in your application is unsupervised learning. There should be many resources and researches in using unsupervised learning for image classification. One of the sample paper:
https://www.cv-foundation.org/openaccess/content_cvpr_2013/papers/Siva_Looking_Beyond_the_2013_CVPR_paper.pdf
Any feedback is welcomed. I stand corrected. Thank you
You need to add yet another class which you will label items not matching your existing labels. So you have aluminum, iron, copper and none_of_the_above
Change your model in line:
model.add(Dense(classes, activation='sigmoid'))
to
model.add(Dense(classes+1, activation='sigmoid'))
And modify your data to label wood as none_of_the_above. You will need a lot of examples that do not match aluminum, iron, copper.
Now add custom accuracy:
def ignore_accuracy_of_class(class_to_ignore=0):
def ignore_acc(y_true, y_pred):
y_true_class = K.argmax(y_true, axis=-1)
y_pred_class = K.argmax(y_pred, axis=-1)
ignore_mask = K.cast(K.not_equal(y_pred_class, class_to_ignore), 'int32')
matches = K.cast(K.equal(y_true_class, y_pred_class), 'int32') * ignore_mask
accuracy = K.sum(matches) / K.maximum(K.sum(ignore_mask), 1)
return accuracy
return ignore_acc
and add it to compile:
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy', ignore_accuracy_of_class(4)])
4 in ignore_accuracy_of_class is a label to ignore. Now you have both accuracy of the whole model and accuracy for only your selected aluminum, iron, copper labels.