Tensorflow classifier is not overfitting, accuracy on train set stays low - tensorflow

This might be a weird question, buy here goes: I have a DNNClassifier in tensorflow that I ran with all sorts of parameters, and numbers of epochs. The weird thing that is happening is that, whatever I do, my accuracy on my train set stays quite low (It's a binary classifier, accuracy stays around 65%). I would expect that with at least some of the configurations I've tried (ones without regularization/dropout), I would've overfitted and reached a higher accuracy on my training set, but alas.
So, what does this mean? That I don't have enough data? That my network is not big enough to fit the data set? I tried doubling the size of my hidden layers, but it still stayed pretty much the same.
I have 23000 training examples, and currently have 3 layers of 100 neurons each.
Code can be found here: https://www.pastebucket.com/563290

Related

How to improve the performance of CNN Model for a specific Dataset? Getting Low Accuracy on both training and Testing Dataset

We were given an assignment in which we were supposed to implement our own neural network, and two other already developed Neural Networks. I have done that and however, this isn't the requirement of the assignment but I still would want to know that what are the steps/procedure I can follow to improve the accuracy of my Models?
I am fairly new to Deep Learning and Machine Learning as a whole so do not have much idea.
The given dataset contains a total of 15 classes (airplane, chair etc.) and we are provided with about 15 images of each class in training dataset. The testing dataset has 10 images of each class.
Complete github repository of my code can be found here (Jupyter Notebook file): https://github.com/hassanashas/Deep-Learning-Models
I tried it out with own CNN first (made one using Youtube tutorials).
Code is as follows,
X_train = X_train/255.0
model = Sequential()
model.add(Conv2D(64, (3, 3), input_shape = X_train.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(16)) # added 16 because it model.fit gave error on 15
model.add(Activation('softmax'))
For the compiling of Model,
from tensorflow.keras.optimizers import SGD
model.compile(loss='sparse_categorical_crossentropy',
optimizer=SGD(learning_rate=0.01),
metrics=['accuracy'])
I used sparse categorical crossentropy because my "y" label was intenger values, ranging from 1 to 15.
I ran this model with following way,
model_fit = model.fit(X_train, y_train, batch_size=32, epochs=30, validation_split=0.1)
It gave me an accuracy of 0.2030 on training dataset and only 0.0733 on the testing dataset (both the datasets are present in the github repository)
Then, I tried out the AlexNet CNN (followed a Youtube tutorial for its code)
I ran the AlexNet on the same dataset for 15 epochs. It improved the accuracy on training dataset to 0.3317, however accuracy on testing dataset was even worse than my own CNN, at only 0.06
Afterwards, I tried out the VGG16 CNN, again following a Youtube Tutorial.
I ran the code on Google Colab for 10 Epochs. It managed to improve to 100% accuracy on training dataset in the 8th epoch. But this model gave the worst accuracy of all three on testing dataset with only 0.0533
I am unable to understand this contrasting behavior of all these models. I have tried out different epoch values, loss functions etc. but the current ones gave the best result relatively. My own CNN was able to get to 100% accuracy when I ran it on 100 epochs (however, it gave very poor results on the testing dataset)
What can I do to improve the performance of these Models? And specifically, what are the few crucial things that one should always try to follow in order to improve efficiency of a Deep Learning Model? I have looked up multiple similar questions on Stackoverflow but almost all of them were working on datasets provided by the tensorflow like mnist dataset and etc. and I didn't find much help from those.
Disclaimer: it's been a few years since I've played with CNNs myself, so I can only pass on some general advice and suggestions.
First of all, I would like to talk about the results you've gotten so far. The first two networks you've trained seem to at least learn something from the training data because they perform better than just randomly guessing.
However: the performance on the test data indicates that the network has not learned anything meaningful because those numbers suggest the network is as good as (or only marginally better than) a random guess.
As for the third network: high accuracy for training data combined with low accuracy for testing data means that your network has overfitted. This means that the network has memorized the training data but has not learned any meaningful patterns.
There's no point in continuing to train a network that has started overfitting. So once the training accuracy increases and testing accuracy decreases for a few epochs consecutively, you can stop training.
Increase the dataset size
Neural networks rely on loads of good training data to learn patterns from. Your dataset contains 15 classes with 15 images each, that is very little training data.
Of course, it would be great if you could get hold of additional high-quality training data to expand your dataset, but that is not always feasible. So a different approach is to artificially expand your dataset. You can easily do this by applying a bunch of transformations to the original training data. Think about: mirroring, rotating, zooming, and cropping.
Remember to not just apply these transformations willy-nilly, they must make sense! For example, if you want a network to recognize a chair, do you also want it to recognize chairs that are upside down? Or for detecting road signs: mirroring them makes no sense because the text, numbers, and graphics will never appear mirrored in real life.
From the brief description of the classes you have (planes and chairs and whatnot...), I think mirroring horizontally could be the best transformation to apply initially. That will already double your training dataset size.
Also, keep in mind that an artificially inflated dataset is never as good as one of the same size that contains all authentic, real images. A mirrored image contains much of the same information as its original, we merely hope it will delay the network from overfitting and hope that it will learn the important patterns instead.
Lower the learning rate
This is a bit of side note, but try lowering the learning rate. Your network seems to overfit in only a few epochs which is very fast. Obviously, lowering the learning rate will not combat overfitting but it will happen more slowly. This means that you can hopefully find an epoch with better overall performance before overfitting takes place.
Note that a lower learning rate will never magically make a bad-performing network good. It's just one way to locate a set of parameters that performs a tad bit better.
Randomize the training data order
During training, the training data is presented in batches to the network. This often happens in a fixed order over all iterations. This may lead to certain biases in the network.
First of all, make sure that the training data is shuffled at least once. You do not want to present the classes one by one, for example first all plane images, then all chairs, etc... This could lead to the network unlearning much of the first class by the end of each epoch.
Also, reshuffle the training data between epochs. This will again avoid potential minor biases because of training data order.
Improve the network design
You've designed a convolutional neural network with only two convolution layers and two fully connected layers. Maybe this model is too shallow to learn to differentiate between the different classes.
Know that the convolution layers tend to first pick up small visual features and then tend to combine these in higher level patterns. So maybe adding a third convolution layer may help the network identify more meaningful patterns.
Obviously, network design is something you'll have to experiment with and making networks overly deep or complex is also a pitfall to watch out for!

Unusual behavior of ADAM optimizer with AMSGrad

I am trying some 1, 2, and 3 layer LSTM networks to classify land cover of some selected pixels from a Landsat time-series spectral data. I tried different optimizers (as implemented in Keras) to see which of them is better, and generally found AMSGrad variant of ADAM doing a relatively better job in my case. However, one strange thing to me is that for the AMSGrad variant, the training and test accuracies start at a relatively high value from the first epoch (instead of increasing gradually) and it changes only slightly after that, as you see in the below graph.
Performance of ADAM optimizer with AMSGrad on
Performance of ADAM optimizer with AMSGrad off
I have not seen this behavior in any other optimizer. Does it show a problem in my experiment? What can be the explanation for this phenomenon?
Pay attention to the number of LSTM layers. They are notorious for easily overfitting the data. Try a smaller model initially(less number of layers), and gradually increase the number of units in a layer. If you notice poor results, then try adding another LSTM layer, but only after the previous step has been done.
As for the optimizers, I have to admit I have never used AMSGrad. However, the plot with regard to the accuracy does seem to be much better in case of the AMSGrad off. You can see that when you use AMSGrad the accuracy on the training set is much better than that on the test set, which a strong sign of overfitting.
Remember to keep things simple, experiment with simple models and generic optimizers.

High variability loss of neural networks

I'm getting really high variability in both the accuracy and loss between each epoch, as high as 10%. It happens to my accuracy all the time, and my loss when I start adding in dropout. However I really need the dropout, any ideas on how to smooth it out?
It is hard to say anything concrete without knowing what you do. But because you mentioned that your dataset is very small: 500 samples, I say that your 10% performance jumps are not surprising. Still a few ideas:
definitely use a bigger dataset if you can. If it is not possible to collect a bigger dataset, try to augment whatever you have.
try a smaller dropout and see how it goes, try different regularizers (dropout is not the only option)
you data is small, you can afford to run more than 200 iterations
see how your model performs on the test set, it is possible that it just severely overfitted the data
Beside the fact that the data set is very small, during a training with a dropout regularization the loss function is not anymore well defined and I presume the accuracy is also biased. Therefore any tracked metric should be assessed without dropout. It seams that keras does not switch it off while calculating the accuracy during training.

Tensorflow Autoencoder with 0 hidden units learns something

I am currently running some tests with simple Autoencoders. I wrote an Autoencoder myself entirely in Tensorflow and in addition copied and pasted the code from this keras blog entry: https://blog.keras.io/building-autoencoders-in-keras.html (just to have a different Autoencoder implementation).
When I was testing different architectures, I started with a single layer and a couple of hidden units in this layer. I noticed that when I reduce the number of hidden units to only a single (!) hidden unit, I still get the same training and test losses I get with bigger architectures (up to a couple of thousand hidden units). In my data, the worst loss is 0.5. Any architecture I've tried achieves ~ 0.15.
Just out of curiosity, I reduced the number of hidden units in the only existing hidden layer to zero (which I know doesn't make any sense). However, I still get a training and test loss of 0.15. I assumed that this strange behavior might be due to the bias in the decoding layer (when I reconstruct the input). Initially, I've set the bias variable (in TF) to trainable=True. So now I guess even without any hidden units, the model still learns the bias in the decoding layer which might lead to the reduction of my loss from 0.5 to 0.15.
In the next step, I set the bias in the decoding layer to trainable=False. Now the model (with no hidden units) doesn't learn anything, just as I would have expected it(loss=0.5). With one hidden unit,however, I again get test and training losses of around 0.15.
Following this line of thought, I set the bias in the encoding layer to trainable=False, since I wanted to avoid that my architecture only learns the bias. So now, only the weights of my autoencoder are trainable. This still works for a single hidden unit (and of course just a single hidden layer). Surprisingly, this only works in case of a single-layer network. As soon as I increase the number of layers (independent of the numbers of hidden units), the network again doesn't learn anything (in case only the weights get updated).
All the things I reported are true for the training loss as well as for the test loss (in a completely independent dataset the machine never sees). This makes it even more curious to me.
My question is: How can it happen that I learn as much from a 1 node "network" as from a bigger one (both for training and testing)? Second, how can it be that even larger nets seem to never overfit (training and test error slightly change, but are always comparable). Any suggestions would be very helpful!
Thanks a lot!
Nils

Odd results for Image Recognition using AlexNet in Deep Learning

I am using a modified AlexNet (cifar-10-model) available in the tensorflow tutorials to do some image recognition of some mechanic part images but getting very wierd results.
The training accuracy is very soon to achieve 100%. But the testing accuracy is starting as high as 45% decreasing very fast to as low as 9%.
I am doing my test on a training set of 20,000 images and testing set of 2,500 images with 8 categories. I do training and testing by batch with size of 1024.
The accuracy and training loss is showed below and you can see that:
The testing accuracy starts at as high as 45%, which doesn't make sense.
The mechanical images are always classified as 'left bracket'
Accuracy
Classification results
your testing accuracy is decreasing, I think it happens because of Overfitting. Try to use simpler model or regularization method to tune the model.
You might want to check your data or feature extraction for errors. I did a protein structure prediction for 3-labels, but I was using a wrong extraction method. My validation accuracy starts at 45% too and then falls quickly.
Knowing where my errors are, I started from scratch: now I do protein structure prediction for 8-labels. The accuracy from the first epoch is 60% and able to rise steadily to 64.9% (the current Q8 world record for CB513 is 68.9%).
So validation accuracy starting at 45% is not a problem, but falling quickly is. I'm afraid that you have an error somewhere in your data/extraction rather than just overfitting.