training model CNN KERAS - tensorflow

hello everyone i am trying to train a model using cnn and keras but the training don't finish and i got this warning and it stops training , i don't know why and i didn't understand where the problem is can anyone gives me a advice or what i should change in the code
def myModel():
no_Of_Filters=60
size_of_Filter=(5,5) # THIS IS THE KERNEL THAT MOVE AROUND THE IMAGE TO GET THE FEATURES.
# THIS WOULD REMOVE 2 PIXELS FROM EACH BORDER WHEN USING 32 32 IMAGE
size_of_Filter2=(3,3)
size_of_pool=(2,2) # SCALE DOWN ALL FEATURE MAP TO GERNALIZE MORE, TO REDUCE OVERFITTING
no_Of_Nodes = 500 # NO. OF NODES IN HIDDEN LAYERS
model= Sequential()
model.add((Conv2D(no_Of_Filters,size_of_Filter,input_shape=(imageDimesions[0],imageDimesions[1],1),activation='relu'))) # ADDING MORE CONVOLUTION LAYERS = LESS FEATURES BUT CAN CAUSE ACCURACY TO INCREASE
model.add((Conv2D(no_Of_Filters, size_of_Filter, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool)) # DOES NOT EFFECT THE DEPTH/NO OF FILTERS
model.add((Conv2D(no_Of_Filters//2, size_of_Filter2,activation='relu')))
model.add((Conv2D(no_Of_Filters // 2, size_of_Filter2, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(no_Of_Nodes,activation='relu'))
model.add(Dropout(0.5)) # INPUTS NODES TO DROP WITH EACH UPDATE 1 ALL 0 NONE
model.add(Dense(noOfClasses,activation='softmax')) # OUTPUT LAYER
# COMPILE MODEL
model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
return model
############################### TRAIN
model = myModel()
print(model.summary())
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),steps_per_epoch=steps_per_epoch_val,epochs=epochs_val,validation_data=(X_validation,y_validation),shuffle=1)
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 20000 batches). You may need to use the repeat() function when building your dataset.

While using generators, you can either run the model without the step_per_epoch parameter and let the model figure out how many steps are there to cover an epoch.
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),
epochs=epochs_val,
validation_data=(X_validation,y_validation),
shuffle=1)
OR
you'll have to calculate steps_per_epoch and use it while training as follows;
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),
steps_per_epoch=(data_samples/batch_size)
epochs=epochs_val,
validation_data=(X_validation,y_validation),
shuffle=1)
Let us know if the issue still persists. Thanks!

Related

Stopping training at maximum validation accuracy. Is this a good practice?

I am training my model with a dataset of 200 images. I have created a binary classification CNN that looks like this one:
classifier = Sequential()
# Adding a first convolutional layer
classifier.add(Convolution2D(48, 3, input_shape = (320, 320, 3), activation = 'relu'))
classifier.add(MaxPooling2D())
# Adding a second convolutional layer
classifier.add(Convolution2D(48, 3, activation = 'relu'))
classifier.add(MaxPooling2D())
# Adding a third convolutional layer
classifier.add(Convolution2D(48, 3, activation = 'relu'))
classifier.add(MaxPooling2D())
#Flattening
classifier.add(Flatten())
#Full connected
classifier.add(Dense(256, activation = 'relu'))
#Full connected
classifier.add(Dense(256, activation = 'sigmoid'))
#Dropout
classifier.add(Dropout(0.5))
#Full connected
classifier.add(Dense(1, activation = 'sigmoid'))
# Compiling the CNN
opt = keras.optimizers.Adam(learning_rate=0.001)
classifier.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.summary()
I am also using Image Data Augmentation and Early Stopping based on val_accuracy with a patience of 10.
My results are the following:
Result graph validation accuracy
The best validation accuracy I get is 0.9231 at the 21st epoch. Should I stop the training with a custom callback once I surpass 92% or is it a bad practice?
Would it be a good practice to set a custom callback that stops training
The best practice here is to save the model every time the validation accuracy hits a maximum, but to keep training. Alternatively, you could save a model after each epoch, and choose the best one to use by checking the validation graph (I'd suggest epoch 11 here. After 11 the validation graph is just oscillating, which is mostly noise).
Finally, 200 images is rarely enough to get good results. You want thousands or tens of thousands at least. Even your validation set should have at least 100 images so that even minor changes to the model show smooth changes in the validation curve. You should also consider adding some data augmentation if you aren't doing it already.

Can this be considered overfitting?

I have 4 classes, each with 1350 images. The validation set has 20% of the total images (it is generated automatically). The training model uses MobilenetV2 network:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
The model is created:
model = tf.keras.Sequential([
base_model,
tf.keras.layers.Conv2D(32, 3, activation='relu', kernel_regularizer=regularizers.l2(0.001)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.MaxPool2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(4, activation='softmax', kernel_regularizer=regularizers.l2(0.001))
])
The model is trained through 20 epochs and then fine tunning is done in 15 epochs. The result is as follows:
Image of the model trained before fine tunning
Image of the model trained after 15 epochs and fine tunning
A bit difficult to tell without the numeric values of validation loss but I would say the results before fine tuning are slightly over fitting and for after fine tuning less over fitting. A couple of additional things you could do. One is try using an adjustable learning rate using the callback tf.keras.callbacks.ReduceLROnPlateau. Set it up to monitor validation loss. Documentation is here. I set factor=.5 and patience=1. Second replace the Flatten layer with tf.keras.layers.GlobalMaxPool2D and see if it improves the validation loss.

Using batch size with TensorFlow Validation Monitor

I'm using tf.contrib.learn.Estimator to train a CNN having 20+ layers. I'm using GTX 1080 (8 GB) for training. My dataset is not so large but my GPU runs out of memory with a batch size greater than 32. So I'm using a batch size of 16 for training and Evaluating the classifier (GPU runs out of memory while evaluation as well if a batch_size is not specified).
# Configure the accuracy metric for evaluation
metrics = {
"accuracy":
learn.MetricSpec(
metric_fn=tf.metrics.accuracy, prediction_key="classes"),
}
# Evaluate the model and print results
eval_results = classifier.evaluate(
x=X_test, y=y_test, metrics=metrics, batch_size=16)
Now the problem is that after every 100 steps, I only get training loss printed on screen. I want to print validation loss and accuracy as well, So I'm using a ValidationMonitor
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
X_test,
y_test,
every_n_steps=50)
# Train the model
classifier.fit(
x=X_train,
y=y_train,
batch_size=8,
steps=20000,
monitors=[validation_monitor]
ActualProblem: My code crashes (Out of Memory) when I use ValidationMonitor, I think the problem might be solved if I could specify a batch size here as well and I can't figure out how to do that. I want ValidationMonitor to evaluate my validation data in batches, like I do it manually after training using classifier.evaluate, is there a way to do that?
The ValidationMonitor's constructor accepts a batch_size arg that should do the trick.
You need to add config=tf.contrib.learn.RunConfig( save_checkpoints_secs=save_checkpoints_secs) in your model definition. The save_checkpoints_secs can be changed to save_checkpoints_steps, but not both.

Keras BatchNorm: Training accuracy increases while Testing accuracy decreases

I am trying to use BatchNorm in Keras. The training accuracy increases over time. From 12% to 20%, slowly but surely.
The test accuracy however decreases from 12% to 0%. Random baseline is 12%.
I very much assume this is due to the batchnorm layer (removing the batchnorm layer results in ~12% test accuracy), which maybe does not initialize parameters gamma and beta well enough. Do I have to regard anything special when applying batchnorm? I don't really understand what else could have gone wrong. I have the following model:
model = Sequential()
model.add(BatchNormalization(input_shape=(16, 8)))
model.add(Reshape((16, 8, 1)))
#1. Conv (64 filters; 3x3 kernel)
model.add(default_Conv2D())
model.add(BatchNormalization(axis=3))
model.add(Activation('relu'))
#2. Conv (64 filters; 3x3 kernel)
model.add(default_Conv2D())
model.add(BatchNormalization(axis=3))
model.add(Activation('relu'))
...
#8. Affine (NUM_GESTURES units) Output layer
model.add(default_Dense(NUM_GESTURES))
model.add(Activation('softmax'))
sgd = optimizers.SGD(lr=0.1)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
default_Conv2D and default_Dense are defined as follows:
def default_Conv2D():
return Conv2D(
filters=64,
kernel_size=3,
strides=1,
padding='same',
# activation=None,
# use_bias=True,
# kernel_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None), #RandomUniform(),
kernel_regularizer=regularizers.l2(0.0001),
# bias_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None), # RandomUniform(),
# bias_regularizer=None
)
def default_Dense(units):
return Dense(
units=units,
# activation=None,
# use_bias=True,
# kernel_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None),#RandomUniform(),
# bias_initializer=RandomNormal(mean=0.0, stddev=0.01, seed=None),#RandomUniform(),
kernel_regularizer=regularizers.l2(0.0001),
# bias_regularizer=None
)
The issue is overfitting.
This is supported by your first 2 observations :
The training accuracy increases over time. From 12% to 20%,.. test accuracy however decreases from 12% to 0%
removing the batchnorm layer results in ~12% test accuracy
The first statement tells me that your network is memorizing the training set. The second statement tells me that when you prevent the network from memorizing the training set (or even learning) then it stops making error to do with memorization.
There are a few solutions to overfitting, but it is a problem large than this post. Please treat the following list as a "top" list and not exhaustive:
add a regularizer like Dropout just before your final fully connected layer.
add a L1 or L2 regularizer on matrix weights
add a regularizer like Dropout between CONV
your network may have too many free parameters. try reducing the layers to just 1 CONV, and add one more layer at a time retraining and testing each time.
slow increase in accuracy
As a side note, you hinted that your accuracy isn't increasing as fast as you like by saying slowly but surely. I've had great success when I've done all of the following steps
change your loss function to be the average loss of all predictions for all items in the mini-batch. This makes your loss function independent of your batch size which you'll discover that if you change your batch size and your loss function changes with it then you'll have to change your learning rate in SGD.
your loss is a single number that is the average of the loss for all predicted classes and all samples, so use a learning rate of 1.0. No need to scale it anymore.
use tf.train.MomentumOptimizer with learning_rate = 1.0 and momentum = 0.5. MomentumOptimizer has been shown to be much more robust than GradientDescent.
It seems that there was something broken with Keras itself.
A naive
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
did the trick.
#wontonimo, thanks a lot for your really great answer!

Training and Loss not changing in Keras CNN model

I am running a CNN for left and right shoeprint classfication. I have 190,000 training images and I use 10% of it for validation. My model is setup as shown below. I get the paths of all the images, read them in and resize them. I normalize the image, and then fit it to the model. My issue is that I have stuck at a training accuracy of 62.5% and a loss of around 0.6615-0.6619. Is there something wrong that I am doing? How can I stop this from happening?
Just some interesting points to note:
I first tested this on 10 images I was having the same issue but changing the optimizer to adam and batch size to 4 worked.
I then tested on more and more images, but each time I would need to change the batch size to get improvements in the accuracy and loss. With 10,000 images I had to use a batch size of 500 and optimizer rmsprop. However, the accuracy and loss only really began to change after epoch 10.
I am now training on 190,000 images and I cannot increase the batch size as my GPU is at is max.
imageWidth = 50
imageHeight = 150
def get_filepaths(directory):
file_paths = []
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths
def cleanUpPaths(fullFilePaths):
cleanPaths = []
for f in fullFilePaths:
if f.endswith(".png"):
cleanPaths.append(f)
return cleanPaths
def getTrainData(paths):
trainData = []
for i in xrange(1,190000,2):
im = image.imread(paths[i])
im = image.imresize(im, (150,50))
im = (im-255)/float(255)
trainData.append(im)
trainData = np.asarray(trainData)
right = np.zeros(47500)
left = np.ones(47500)
trainLabels = np.concatenate((left, right))
trainLabels = np_utils.to_categorical(trainLabels)
return (trainData, trainLabels)
#create the convnet
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(imageWidth,imageHeight,1),strides=1))#32
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu',strides=1))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (1, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])
#prepare the training data*/
trainPaths = get_filepaths("better1/train")
trainPaths = cleanUpPaths(trainPaths)
(trainData, trainLabels) = getTrainData(trainPaths)
trainData = np.reshape(trainData,(95000,imageWidth,imageHeight,1)).astype('float32')
trainData = (trainData-255)/float(255)
#train the convnet***
model.fit(trainData, trainLabels, batch_size=500, epochs=50, validation_split=0.2)
#/save the model and weights*/
model.save('myConvnet_model5.h5');
model.save_weights('myConvnet_weights5.h5');
I've had this issue a number of times now, so thought to make a little recap of it and possible solutions etc. to help people in the future.
Issue: Model predicts one of the 2 (or more) possible classes for all data it sees*
Confirming issue is occurring: Method 1: accuracy for model stays around 0.5 while training (or 1/n where n is number of classes). Method 2: Get the counts of each class in predictions and confirm it's predicting all one class.
Fixes/Checks (in somewhat of an order):
Double Check Model Architecture: use model.summary(), inspect the model.
Check Data Labels: make sure the labelling of your train data hasn't got mixed up somewhere in the preprocessing etc. (it happens!)
Check Train Data Feeding Is Randomised: make sure you are not feeding your train data to the model one class at a time. For instance if using ImageDataGenerator().flow_from_directory(PATH), check that param shuffle=True and that batch_size is greater than 1.
Check Pre-Trained Layers Are Not Trainable:** If using a pre-trained model, ensure that any layers that use pre-trained weights are NOT initially trainable. For the first epochs, only the newly added (randomly initialised) layers should be trainable; for layer in pretrained_model.layers: layer.trainable = False should be somewhere in your code.
Ramp Down Learning Rate: Keep reducing your learning rate by factors of 10 and retrying. Note you will have to fully reinitialize the layers you are trying to train each time you try a new learning rate. (For instance, I had this issue that was only solved once I got down to lr=1e-6, so keep going!)
If any of you know of more fixes/checks that could possible get the model training properly then please do contribute and I'll try to update the list.
**Note that is common to make more of the pretrained model trainable, once the new layers have been initially trained "enough"
*Other names for the issue to help searches get here...
keras tensorflow theano CNN convolutional neural network bad training stuck fixed not static broken bug bugged jammed training optimization optimisation only 0.5 accuracy does not change only predicts one single class wont train model stuck on class model resetting itself between epochs keras CNN same output
You can try to add a BatchNornmalization() layer after MaxPooling2D(). It works for me.
I just have 2 things more to add to the great list of DBCerigo.
Check activation functions: some layers have linear activation function by default, if you do not insert some non linearity into your model it wont be able to generalize, so the net will try to learn how to separate linearly a feature space that is not linear. Making sure you have your non linearity set is a good checkpoint.
Check Model Complexity: if you have a relatively simple model and it learns only till the 1st or the 2nd epoch and then it stalls, it may be that it is trying to learn something too complex. Try making the model deeper. This usually happens when working with frozen models with only 1 or 2 layers unfrozen.
Although the 2nd one may be obvious, I run into his problem once and I lost lots of time checking everythin (data, batches, LR...) before figuring out.
Hope this helps
I would try a couple of things. A lower learning rate should help with more data. Generally, adapting the optimizer should help. Additionally your network seems really small, you might want to increase the capacity of the model by adding layers or increasing the number of filters in the layers.
A better description on how to apply deep learning in practice is given here.
in my case it is the activification function matters. I change from 'sgd' to 'a'