Tensor flow DNN intermitent error - tensorflow

I am using a Tensor Flow DNN classifier to recognize emotions in images. The training accuracy I am getting is around 80%. However, if I run the application again (fully train and test) I occasionally get a really low test accuracy, around 25%. This is without changing any code or dataset.
I understand that the initial weights are randomized in DNN classifiers but that would not give such a large difference in test accuracy.
I am using 23 features and have tested with varying sizes of datasets (50 - 1000 images). The intermittent low test accuracy always exists.

Related

Tensorflow reducing learning rates of saved model

I am working on cnn model which has 4 conv layers and 3 dense layers. dataset have around 28000 images and 7000 test images. The model has saved checkpoints and I have trained it several times and achieved 60 % accuracy so far, and while training learning rate is reduced to 2.6214403e-07 (as i used ReduceLROnPlateau factor 0.4). I have question if I increased the learning rate say 1e-4. and resumed the training how will it effect my model? Is It a good idea?
accuracy vs epoch
If your learning curve plateaus immediately and doesn't change much beyond the initial few epochs (as in your case), then your learning rate is too low. While you can resume training with higher learning rates, it would likely render any progress of the initial epochs meaningless. Since you typically only decrease the learning rate between epochs, and given the slow initial progress of your network, you should simply retrain with an increased initial learning rate until you see larger changes in the first few epochs. You can then identify the point of convergence by whenever overfitting happens (test accuracy goes down while train accuracy goes up) and stop there. If this point is still "unnecessarily late", you can additionally reduce the amount that the learning rate decays to make faster progress between epochs.

Training SSD-MOBILENET V1 and the loss does not deacrease

I'm new in everithing about CNN and tensorflow. Im training a pretrained ssd-mobilenev1-pets.config to detect columns of buildings, about one day but the loss is between 2-1 and doesnt decrease since 10 hours ago.
I realized that my input images are 128x128 and SSD resize de image to 300*300.
Does the size of the input images affect the training?
If that is the case, should I retrain the network with larger input images? or what would be another option to decrease the loss? my train dataset has 660 images and test 166 I dont Know if there are enough images
I really aprecciate your help ....
Loss values of ssd_mobilenet can be different from faster_rcnn. From EdjeElectronics' TensorFlow Object Detection Tutorial:
For my training on the Faster-RCNN-Inception-V2 model, it started at
about 3.0 and quickly dropped below 0.8. I recommend allowing your
model to train until the loss consistently drops below 0.05, which
will take about 40,000 steps, or about 2 hours (depending on how
powerful your CPU and GPU are). Note: The loss numbers will be
different if a different model is used. MobileNet-SSD starts with a
loss of about 20, and should be trained until the loss is consistently
under 2.
For more information: https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#6-run-the-training
The SSD Mobilnet architecture demands additional training to suffice
the loss accuracy values of the R-CNN model, however, offers
practicality, scalability, and easy accessibility on smaller devices
which reveals the SSD model as a promising candidate for further
assessment (Fleury and Fleury, 2018).
For more information: Fleury, D. & Fleury, A. (2018). Implementation of Regional-CNN and SSD machine learning object detection architectures for the real time analysis of blood borne pathogens in dark field microscopy. MDPI AG.
I would recommend you to take 15%-20% images for testing which cover all the variety present in training data. As you said you have 650+ images for training and 150+ for testing. That is roughly 25% of testing images. It looks like you have enough images to start with. I know the more, the merrier but make sure your model also has sufficient data to learn from!
Resizing the images does not contribute to the loss. It makes sure there is consistency across all images for the model to recognize them without bias. The loss has nothing to do with image resizing as long as every image is resized identically.
You have to make stops and recover checkpoints again and again if you want your model to be perfectly fit. Usually, you can get away with good accuracy by re-training the ssd mobilenet until the loss consistently becomes under 1.Ideally we want the loss to be as lower as possible but we want to make sure the model is not over-fitting. It is all about trial and error. (Loss between 0.5 and 1 seems to be doing the job well but again it all depends on you.)
The reason I think your model is underperforming is due to the fact that you have variety of testing data and not enough training data to suffice.
The model has not been given enough knowledge in training data to make the model learn for new variety of testing data. (For example : Your test data has some images of new angles of buildings which are not sufficiently present in training data). In that case, I recommend you to put variety of all images in training data and then picking images to test making sure you still have sufficient training data of new postures. That's why I recommend you to take 15%-20% test data.

Indication of overfitting

I'm training an image recognition model using Inception and transfer learning, based on the Tensorflow of Poets tutorial.
I have it running for 500k steps, looking to see the optimum number of steps before overtraining strats. The below tensorboard image displays my training accuracy steadily rising but validation accuracy has plateaued around 70K steps. My understanding was validation accuracy would start going down when it started overtraining.
What would be my optimum number of steps in the below chart? 70k steps or 260k?
It is crystal clear that you are overfitting your model. To solve the overfitting problem there are several solutions:
1) Early stopping.
2) Regularization.
3) Reducing your model VC dimension by reducing the number of layers or number of units per layer.
4) Augmenting your dataset.
5) Applying transfer learning.
For your case, you can try early stopping. The best number of iterations according to your graph is 60K.

Tensorflow classifier is not overfitting, accuracy on train set stays low

This might be a weird question, buy here goes: I have a DNNClassifier in tensorflow that I ran with all sorts of parameters, and numbers of epochs. The weird thing that is happening is that, whatever I do, my accuracy on my train set stays quite low (It's a binary classifier, accuracy stays around 65%). I would expect that with at least some of the configurations I've tried (ones without regularization/dropout), I would've overfitted and reached a higher accuracy on my training set, but alas.
So, what does this mean? That I don't have enough data? That my network is not big enough to fit the data set? I tried doubling the size of my hidden layers, but it still stayed pretty much the same.
I have 23000 training examples, and currently have 3 layers of 100 neurons each.
Code can be found here: https://www.pastebucket.com/563290

Odd results for Image Recognition using AlexNet in Deep Learning

I am using a modified AlexNet (cifar-10-model) available in the tensorflow tutorials to do some image recognition of some mechanic part images but getting very wierd results.
The training accuracy is very soon to achieve 100%. But the testing accuracy is starting as high as 45% decreasing very fast to as low as 9%.
I am doing my test on a training set of 20,000 images and testing set of 2,500 images with 8 categories. I do training and testing by batch with size of 1024.
The accuracy and training loss is showed below and you can see that:
The testing accuracy starts at as high as 45%, which doesn't make sense.
The mechanical images are always classified as 'left bracket'
Accuracy
Classification results
your testing accuracy is decreasing, I think it happens because of Overfitting. Try to use simpler model or regularization method to tune the model.
You might want to check your data or feature extraction for errors. I did a protein structure prediction for 3-labels, but I was using a wrong extraction method. My validation accuracy starts at 45% too and then falls quickly.
Knowing where my errors are, I started from scratch: now I do protein structure prediction for 8-labels. The accuracy from the first epoch is 60% and able to rise steadily to 64.9% (the current Q8 world record for CB513 is 68.9%).
So validation accuracy starting at 45% is not a problem, but falling quickly is. I'm afraid that you have an error somewhere in your data/extraction rather than just overfitting.