Odd results for Image Recognition using AlexNet in Deep Learning

Odd results for Image Recognition using AlexNet in Deep Learning - tensorflow

I am using a modified AlexNet (cifar-10-model) available in the tensorflow tutorials to do some image recognition of some mechanic part images but getting very wierd results.
The training accuracy is very soon to achieve 100%. But the testing accuracy is starting as high as 45% decreasing very fast to as low as 9%.
I am doing my test on a training set of 20,000 images and testing set of 2,500 images with 8 categories. I do training and testing by batch with size of 1024.
The accuracy and training loss is showed below and you can see that:
The testing accuracy starts at as high as 45%, which doesn't make sense.
The mechanical images are always classified as 'left bracket'
Accuracy
Classification results

your testing accuracy is decreasing, I think it happens because of Overfitting. Try to use simpler model or regularization method to tune the model.

You might want to check your data or feature extraction for errors. I did a protein structure prediction for 3-labels, but I was using a wrong extraction method. My validation accuracy starts at 45% too and then falls quickly.
Knowing where my errors are, I started from scratch: now I do protein structure prediction for 8-labels. The accuracy from the first epoch is 60% and able to rise steadily to 64.9% (the current Q8 world record for CB513 is 68.9%).
So validation accuracy starting at 45% is not a problem, but falling quickly is. I'm afraid that you have an error somewhere in your data/extraction rather than just overfitting.

Related

Tensorflow reducing learning rates of saved model

I am working on cnn model which has 4 conv layers and 3 dense layers. dataset have around 28000 images and 7000 test images. The model has saved checkpoints and I have trained it several times and achieved 60 % accuracy so far, and while training learning rate is reduced to 2.6214403e-07 (as i used ReduceLROnPlateau factor 0.4). I have question if I increased the learning rate say 1e-4. and resumed the training how will it effect my model? Is It a good idea?
accuracy vs epoch

If your learning curve plateaus immediately and doesn't change much beyond the initial few epochs (as in your case), then your learning rate is too low. While you can resume training with higher learning rates, it would likely render any progress of the initial epochs meaningless. Since you typically only decrease the learning rate between epochs, and given the slow initial progress of your network, you should simply retrain with an increased initial learning rate until you see larger changes in the first few epochs. You can then identify the point of convergence by whenever overfitting happens (test accuracy goes down while train accuracy goes up) and stop there. If this point is still "unnecessarily late", you can additionally reduce the amount that the learning rate decays to make faster progress between epochs.

Does it make sense to maximize both training and validation accuracy?

While training my CNNs I usually aim to maximize the validation accuracy to 1.0 (i.e. 100%). I know that on the other hand it would not make much sense to aim for a training accuracy of 1.0, because we don't want our model to memorize the training data itself.
However, what about a "mixed" approach --
wouldn't it make sense to maximize both training and validation accuracy?

Let's first address what the purpose of validation is:
When we're training a neural net, we are trying to teach the neural net to perform well at a given task for the entire population of input/output pairs in the task. However, it is unrealistic to have the entire dataset, especially for high dimensional inputs such as images. Therefore, we create a training dataset that contains a (hopefully) large amount of that data. We hope when we're training a neural net that by maximizing performance on the training dataset, we maximize performance on the entire dataset. This is called generalization.
How do we know that the neural net is generalizing well? As you mentioned, we don't want to simply memorize the training data. That is where validation accuracy comes in. We feed data that the neural net did not train on through the network to evaluate its performance. Therefore, the purpose of the validations set is to measure the generalization.
You should watch both the training and validation accuracy. The difference between the validation and training accuracy is called the generalization gap, which will tell you how well your neural net is generalizing to new inputs. You want both the training and validation accuracy to be high, and the difference between them to be minimal.

Technically if you could do so, that would be awesome, you wouldn't say a model is over fitting unless there is a gap between validation accuracy and training accuracy, if their values are close, both high or both low, then the model is not over fitting. ideally you want high accuracy on all samples, training, validation and testing, but as I said "IDEALLY". you just don't care as much about training samples.

Training SSD-MOBILENET V1 and the loss does not deacrease

I'm new in everithing about CNN and tensorflow. Im training a pretrained ssd-mobilenev1-pets.config to detect columns of buildings, about one day but the loss is between 2-1 and doesnt decrease since 10 hours ago.
I realized that my input images are 128x128 and SSD resize de image to 300*300.
Does the size of the input images affect the training?
If that is the case, should I retrain the network with larger input images? or what would be another option to decrease the loss? my train dataset has 660 images and test 166 I dont Know if there are enough images
I really aprecciate your help ....

Loss values of ssd_mobilenet can be different from faster_rcnn. From EdjeElectronics' TensorFlow Object Detection Tutorial:
For my training on the Faster-RCNN-Inception-V2 model, it started at
about 3.0 and quickly dropped below 0.8. I recommend allowing your
model to train until the loss consistently drops below 0.05, which
will take about 40,000 steps, or about 2 hours (depending on how
powerful your CPU and GPU are). Note: The loss numbers will be
different if a different model is used. MobileNet-SSD starts with a
loss of about 20, and should be trained until the loss is consistently
under 2.
For more information: https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#6-run-the-training
The SSD Mobilnet architecture demands additional training to suffice
the loss accuracy values of the R-CNN model, however, offers
practicality, scalability, and easy accessibility on smaller devices
which reveals the SSD model as a promising candidate for further
assessment (Fleury and Fleury, 2018).
For more information: Fleury, D. & Fleury, A. (2018). Implementation of Regional-CNN and SSD machine learning object detection architectures for the real time analysis of blood borne pathogens in dark field microscopy. MDPI AG.

I would recommend you to take 15%-20% images for testing which cover all the variety present in training data. As you said you have 650+ images for training and 150+ for testing. That is roughly 25% of testing images. It looks like you have enough images to start with. I know the more, the merrier but make sure your model also has sufficient data to learn from!
Resizing the images does not contribute to the loss. It makes sure there is consistency across all images for the model to recognize them without bias. The loss has nothing to do with image resizing as long as every image is resized identically.
You have to make stops and recover checkpoints again and again if you want your model to be perfectly fit. Usually, you can get away with good accuracy by re-training the ssd mobilenet until the loss consistently becomes under 1.Ideally we want the loss to be as lower as possible but we want to make sure the model is not over-fitting. It is all about trial and error. (Loss between 0.5 and 1 seems to be doing the job well but again it all depends on you.)
The reason I think your model is underperforming is due to the fact that you have variety of testing data and not enough training data to suffice.
The model has not been given enough knowledge in training data to make the model learn for new variety of testing data. (For example : Your test data has some images of new angles of buildings which are not sufficiently present in training data). In that case, I recommend you to put variety of all images in training data and then picking images to test making sure you still have sufficient training data of new postures. That's why I recommend you to take 15%-20% test data.

Tensor flow DNN intermitent error

I am using a Tensor Flow DNN classifier to recognize emotions in images. The training accuracy I am getting is around 80%. However, if I run the application again (fully train and test) I occasionally get a really low test accuracy, around 25%. This is without changing any code or dataset.
I understand that the initial weights are randomized in DNN classifiers but that would not give such a large difference in test accuracy.
I am using 23 features and have tested with varying sizes of datasets (50 - 1000 images). The intermittent low test accuracy always exists.

Tensorflow classifier is not overfitting, accuracy on train set stays low

This might be a weird question, buy here goes: I have a DNNClassifier in tensorflow that I ran with all sorts of parameters, and numbers of epochs. The weird thing that is happening is that, whatever I do, my accuracy on my train set stays quite low (It's a binary classifier, accuracy stays around 65%). I would expect that with at least some of the configurations I've tried (ones without regularization/dropout), I would've overfitted and reached a higher accuracy on my training set, but alas.
So, what does this mean? That I don't have enough data? That my network is not big enough to fit the data set? I tried doubling the size of my hidden layers, but it still stayed pretty much the same.
I have 23000 training examples, and currently have 3 layers of 100 neurons each.
Code can be found here: https://www.pastebucket.com/563290

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas