CNN model's val_loss go down well but val_loss change a lot - tensorflow

I'm using keras(tensorflow) to train my own CNN model.
As shown in the chart, the train_loss goes down well, but val_loss has big change among each Epoch.
What can be the reason, and what should I do to improve it?

This is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "goes down well". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
The big spike at the end of your loss curve might be the result of over-fitting. If you are not using a decaying learning rate during training, I would suggest it.

Related

How to improve the performance of CNN Model for a specific Dataset? Getting Low Accuracy on both training and Testing Dataset

We were given an assignment in which we were supposed to implement our own neural network, and two other already developed Neural Networks. I have done that and however, this isn't the requirement of the assignment but I still would want to know that what are the steps/procedure I can follow to improve the accuracy of my Models?
I am fairly new to Deep Learning and Machine Learning as a whole so do not have much idea.
The given dataset contains a total of 15 classes (airplane, chair etc.) and we are provided with about 15 images of each class in training dataset. The testing dataset has 10 images of each class.
Complete github repository of my code can be found here (Jupyter Notebook file): https://github.com/hassanashas/Deep-Learning-Models
I tried it out with own CNN first (made one using Youtube tutorials).
Code is as follows,
X_train = X_train/255.0
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape = X_train.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(16)) # added 16 because it model.fit gave error on 15
model.add(Activation('softmax'))
For the compiling of Model,
from tensorflow.keras.optimizers import SGD
model.compile(loss='sparse_categorical_crossentropy',
optimizer=SGD(learning_rate=0.01),
metrics=['accuracy'])
I used sparse categorical crossentropy because my "y" label was intenger values, ranging from 1 to 15.
I ran this model with following way,
model_fit = model.fit(X_train, y_train, batch_size=32, epochs=30, validation_split=0.1)
It gave me an accuracy of 0.2030 on training dataset and only 0.0733 on the testing dataset (both the datasets are present in the github repository)
Then, I tried out the AlexNet CNN (followed a Youtube tutorial for its code)
I ran the AlexNet on the same dataset for 15 epochs. It improved the accuracy on training dataset to 0.3317, however accuracy on testing dataset was even worse than my own CNN, at only 0.06
Afterwards, I tried out the VGG16 CNN, again following a Youtube Tutorial.
I ran the code on Google Colab for 10 Epochs. It managed to improve to 100% accuracy on training dataset in the 8th epoch. But this model gave the worst accuracy of all three on testing dataset with only 0.0533
I am unable to understand this contrasting behavior of all these models. I have tried out different epoch values, loss functions etc. but the current ones gave the best result relatively. My own CNN was able to get to 100% accuracy when I ran it on 100 epochs (however, it gave very poor results on the testing dataset)
What can I do to improve the performance of these Models? And specifically, what are the few crucial things that one should always try to follow in order to improve efficiency of a Deep Learning Model? I have looked up multiple similar questions on Stackoverflow but almost all of them were working on datasets provided by the tensorflow like mnist dataset and etc. and I didn't find much help from those.
Disclaimer: it's been a few years since I've played with CNNs myself, so I can only pass on some general advice and suggestions.
First of all, I would like to talk about the results you've gotten so far. The first two networks you've trained seem to at least learn something from the training data because they perform better than just randomly guessing.
However: the performance on the test data indicates that the network has not learned anything meaningful because those numbers suggest the network is as good as (or only marginally better than) a random guess.
As for the third network: high accuracy for training data combined with low accuracy for testing data means that your network has overfitted. This means that the network has memorized the training data but has not learned any meaningful patterns.
There's no point in continuing to train a network that has started overfitting. So once the training accuracy increases and testing accuracy decreases for a few epochs consecutively, you can stop training.
Increase the dataset size
Neural networks rely on loads of good training data to learn patterns from. Your dataset contains 15 classes with 15 images each, that is very little training data.
Of course, it would be great if you could get hold of additional high-quality training data to expand your dataset, but that is not always feasible. So a different approach is to artificially expand your dataset. You can easily do this by applying a bunch of transformations to the original training data. Think about: mirroring, rotating, zooming, and cropping.
Remember to not just apply these transformations willy-nilly, they must make sense! For example, if you want a network to recognize a chair, do you also want it to recognize chairs that are upside down? Or for detecting road signs: mirroring them makes no sense because the text, numbers, and graphics will never appear mirrored in real life.
From the brief description of the classes you have (planes and chairs and whatnot...), I think mirroring horizontally could be the best transformation to apply initially. That will already double your training dataset size.
Also, keep in mind that an artificially inflated dataset is never as good as one of the same size that contains all authentic, real images. A mirrored image contains much of the same information as its original, we merely hope it will delay the network from overfitting and hope that it will learn the important patterns instead.
Lower the learning rate
This is a bit of side note, but try lowering the learning rate. Your network seems to overfit in only a few epochs which is very fast. Obviously, lowering the learning rate will not combat overfitting but it will happen more slowly. This means that you can hopefully find an epoch with better overall performance before overfitting takes place.
Note that a lower learning rate will never magically make a bad-performing network good. It's just one way to locate a set of parameters that performs a tad bit better.
Randomize the training data order
During training, the training data is presented in batches to the network. This often happens in a fixed order over all iterations. This may lead to certain biases in the network.
First of all, make sure that the training data is shuffled at least once. You do not want to present the classes one by one, for example first all plane images, then all chairs, etc... This could lead to the network unlearning much of the first class by the end of each epoch.
Also, reshuffle the training data between epochs. This will again avoid potential minor biases because of training data order.
Improve the network design
You've designed a convolutional neural network with only two convolution layers and two fully connected layers. Maybe this model is too shallow to learn to differentiate between the different classes.
Know that the convolution layers tend to first pick up small visual features and then tend to combine these in higher level patterns. So maybe adding a third convolution layer may help the network identify more meaningful patterns.
Obviously, network design is something you'll have to experiment with and making networks overly deep or complex is also a pitfall to watch out for!

Does it make sense to maximize both training and validation accuracy?

While training my CNNs I usually aim to maximize the validation accuracy to 1.0 (i.e. 100%). I know that on the other hand it would not make much sense to aim for a training accuracy of 1.0, because we don't want our model to memorize the training data itself.
However, what about a "mixed" approach --
wouldn't it make sense to maximize both training and validation accuracy?
Let's first address what the purpose of validation is:
When we're training a neural net, we are trying to teach the neural net to perform well at a given task for the entire population of input/output pairs in the task. However, it is unrealistic to have the entire dataset, especially for high dimensional inputs such as images. Therefore, we create a training dataset that contains a (hopefully) large amount of that data. We hope when we're training a neural net that by maximizing performance on the training dataset, we maximize performance on the entire dataset. This is called generalization.
How do we know that the neural net is generalizing well? As you mentioned, we don't want to simply memorize the training data. That is where validation accuracy comes in. We feed data that the neural net did not train on through the network to evaluate its performance. Therefore, the purpose of the validations set is to measure the generalization.
You should watch both the training and validation accuracy. The difference between the validation and training accuracy is called the generalization gap, which will tell you how well your neural net is generalizing to new inputs. You want both the training and validation accuracy to be high, and the difference between them to be minimal.
Technically if you could do so, that would be awesome, you wouldn't say a model is over fitting unless there is a gap between validation accuracy and training accuracy, if their values are close, both high or both low, then the model is not over fitting. ideally you want high accuracy on all samples, training, validation and testing, but as I said "IDEALLY". you just don't care as much about training samples.

Neural network classification loss from validation set: Does it update anything dynamically

I'm attempting t study up a bit on the theory of training neural networks, and right now i have gotten to validation sets.
Now, I can understand that a validation set gives us a loss-index, which helps us in knowing whether we are overfitting or not. But when I read in books and and watch videos, everyone seems to express themselves in a manner that is a bit ambiguous.
Does the model update itself manually when the validation set is being run? can layers, weights, biases, or net-amounts of neurons be updated "automatically" when it is being validated?
Thank you very much
Loss and Accuracy refer to the current loss and accuracy of the training set.
The Loss is being fixed during back propagation of the epoch to improve the Accuracy.
At the end of each epoch your trained Neural Network is evaluated against your validation set. This is what Validation Loss and Validation Accuracy refer to.

TensorFlow model saving to be approached differently during training Vs. deployment?

Assume that I have a CNN which I am training on some dataset. The most important part of the model is the CNN architecture.
Now when I write a code, I define the model structure in a Python class. However, outside that class, I define a number of other nodes such as loss, accuracy, tf.Variable to keep count of epochs and so on.
When I am training, for properly resuming the training, I'd like to save all these nodes (e.g - loss, epoch variable etc), and not just the CNN structure.
However, once I am done with training, I would like to save only the CNN architecture and no nodes for loss, accuracy etc. This is because it will enable people using my model to exercise freedom in writing their own finetuning codes.
How to achieve this in TF code ? Can someone show an example ?
Is this approach towards saving followed by others also ? I just want to know if my approach is right.

Training object detectors from scratch leads to really bad performance

I am trying to train a Faster-RCNN network with Inception-v3 architecture (reference paper: Google's paper) as my fixed feature extractor using keras on my own dataset (number of classes = 4) which is very different compared to the Image-net. Still I initialized it with Image-net weights because this paper gives evidence that initializing with pre-trained weights is always better compared to random initialization.
Upon Training for 60 Epochs my Training accuracy is at 96% and my validation accuracy is at 84% ,Over-fit! (severe maybe?). But what is more worrying is that my loss did not converge at all. Upon testing the network it failed miserably! like, it didn't even detect.
Then I took a slightly different approach. I did a two step training. First I trained the Inception-v3 on my dataset like a classification problem (Still initialized it with Image-net weights) it converged well. Then I used those weights to initialize the Faster-RCNN network. This worked! But, I am confused why this two staged approach works but Training from scratch didn't work. Given I initialized both the methods with the pre-trained image-net weights initially.
Is there a way to train Faster RCNN from scratch?