I need a tensorflow model which recognizes a dog's breed. I downloaded the Stanford Dogs Dataset - 20,580 images in 120 categories (=breeds). I followed the procedure described in TensorFlow For Poets to retrain mobilenet_1.0_224. I used --how_many_training_steps=4000 and defaults for everything else. I got this tensorboard graph:
Training and validation accuracy
The validation accuracy is only about 80%.
What can I do to improve it?
In the research paper MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, the test accuracy using the 'MobileNet_1.0_224' architecture on the Stanford Dogs dataset is 83.3%, which seems in line with your results.
When you visually examine the Stanford Dogs Dataset you will find a lot of the breeds look similar, which makes it hard to reach a higher accuracy, even with the state of the art image classifiers in accuracy. You might improve your results by either splitting similar looking breeds into larger subcategories.
Alternatively, you might tweak the training settings of the retrain.py script in the Tensorflow for Poets tutorial, but the gains will be likely be marginal.
Related
I have been learning Neural networks for a while now, predominantly with respect to natural language processing. I have been using Kaggle notebooks since I am a beginner. So, recently I was working on a Tamil News Classification dataset I found on Kaggle. The model uses LSTM RNN neural network to classify the news into appropriate news groups. The code in the notebook has an accuracy of around 90+. (Notebook for Reference: https://www.kaggle.com/sagorsemantics/tamil-nlp-lstm) When I tried to create an LSTM Model, my accuracy was around 34%, despite using the same layers, activation function, optimizer, hyperparameters etc. Which I thought was strange. After asking around, I was advised to use hyperparameter tuning to achieve a higher accuracy. I did so. (My code here: https://github.com/Vijeeguna/Tamil-News-Article-Classification/blob/main/tamil_news_classification_LSTM_RNN_CNN.py) But my accuracy continues to be low at 34%. I have played around with layers, dropout, etc. But the accuracy wont budge.
I am at a loss. I don't understand how/why this is. Any nudge in the right direction would be most welcome.
Code on Collab with accuracy I got: https://colab.research.google.com/drive/1P7H6J98GGizrGpMXl8QtTAzWsdgIvGAw?usp=sharing
[Also, I am a true novice. I have been learning thro Kaggle notebooks almost exclusively. Please be patient and dumb things down for me.]
for my ML project I want to use the faster_rcnn_resnet101_kitti model from tensorflow model zoo. As the number of images in the Kitti dataset is extremely small (about 7000 images) for a deep learning practice, I was wondering how this small amount of data leads to the decent inference performance (mAP#0.5=87)? One answer I can imagine is that the network was first trained on a different, rich dataset and fine tuned on the Kitti but I am not sure about it.
I am wondering how can I find out the exact underlying training procedure (apart from pipeline.config) for the models published on TF model zoo?
Thanks
I am not able to understand the purpose of a pre-trained network. From what I read, it is used for the RPN and the Classification Network. But I dont't understand how.
CNNs take a notoriously long time to train, especially for more complex models with higher resolutions. In order to avoid the days of training on a high-end GPU, pre-trained models have been made available. You then just have to train on your specific data (assuming your data is similar to the pre-trained data). For instance, if you want to train a CNN to recognize cats in high resolution images, you might want to start with a pre-trained model that recognizes dogs. The training should take a lot, lot less time due to the fact that a lot of the same underlying patterns have already been learned and all your training needs to do is differentiate cats from dogs.
I am re-training the SSD MobileNet with 900 images from the Berkeley Deep Drive dataset, and eval towards 100 images from that dataset.
The problem is that after about 24 hours of training, the totalloss seems unable to go below 2.0:
And the corresponding mAP score is quite unstable:
In fact, I have actually tried to train for about 48 hours, and the TotoalLoss just cannot go below 2.0, something ranging from 2.5~3.0. And during that time, mAP is even lower..
So here is my question, given my situation (I really don't need any "high-precision" model, as you can see, I pick 900 images for training and would like to simply do a PoC model training/predication and that's it), when should I stop the training and obtain a reasonably performed model?
indeed for detection you need to finetune the network, since you are using SSD, there are already some sources out there:
https://gluon-cv.mxnet.io/build/examples_detection/finetune_detection.html (This one specifically for an SSD Model, uses mxnet but you can use the same with TF)
You can watch a very nice finetuning intro here
This repo has a nice fine tuning option enabled as long as you write your dataloader, check it out here
In general your error can be attributed to many factors, the learning rate you are using, the characteristics of the images themselves (are they normalized?) If the ssd network you are using was trained with normalized data and you don't normalize to retrain then you'll get stuck while learning. Also what learning rate are they using?
From the model zoo I can see that for SSD there are models trained on COCO
And models trained on Open Images:
If for example you are using ssd_inception_v2_coco, there is a truncated_normal_initializer in the input layers, so take that into consideration, also make sure the input sizes are the same that the ones you provide to the model.
You can get very good detections even with little data if you also include many augmentations and take into account the rest of the things I mentioned, more details on your code would help to see where the problem lies.
I'm working on a medical Xray images dataset trying to do a binary classification.
After many tries, I have a found a model that can overfit my training set with > 99% accuracy but from the validation curve look it seems like my model has only learned irrelevant details.
What do you think ?
When I try to introduce dropout, training become incredibly slow with bad acc.
If I try image augmentation, results are more promising but of course much slower.
I wonder what to look next:
try running more epochs on the image augmented model
try some medical pretrained model (do you know where to look)
What would you use as parameters for image augmentation (preferably in Keras) with Xray images ?