why my object detection training loss is very low (<1),but the precision is still 0? - object-detection

I am using colab to train an object detection model, please check the colab. But i found the loss is very low, the presion is still zero, what is the wrong ?

Fine, i should be patient, after trainiing 10000 steps, the prediction box comes out. In my case, the precision is low maybe is because i implement many tricks to simplify the model complexity and squeeze the model file size.

Related

Tensorflow Object Detection API Evaluation mAP randomly goes to 0

I am using the Tensorflow Object Detection API to train a SSDLite (MobileNet V2) object detection model. During the course of training, the evaluation results spontaneously go to 0 at unpredictable parts of the training (in pic below after ~35k iterations). Notably, the training loss is not affected, which makes me think this is not just an exploding gradient problem.
EDIT: The issue only seems to happen when I use larger input image resolution, in this case 640x480. If I keep the original 300x300 input resolution, everything works correctly.
Here is a link to my config file in case that helps.
Any help would be greatly appreciated. Thanks!

How to improve recall of faster rcnn object detection model

I'm retraining a faster rcnn inception coco model for detecting brand of products on shelf.
I stopped the model around 400k steps when total loss dropped under 0.1 over a period of time. The recall was around 65% and precision was 40% with 95% confidence cut-off threshold.
Learning rate started at 0.00001 and configured to reduce to 0.000005 after 200k steps.
The dataset size is 15 classes with at least 100 annotated boxes for each class. Total number of images is 300.
How to improve recall of the model?
Should I change to faster rcnn ras (which has higher mAP but I don't think precision is as important as recall in my use case)?
Another question is: usually what's the recall for a object detection model? Is it very challenging to reach higher than 90%?
Many thanks in advance!
You could try using image augmentation to expand your training dataset. 300 images is not much. Try looking at https://github.com/aleju/imgaug.
As you are asking about the Faster RCNN model, you can track two different metrics.
Precision and Recall for the Region Proposal Network (RPN).
Precision and Recall for the RCNN Final output.
The above two metrics can give us a better understanding of how the model is performing.
Case 1: When the recall of RPN is high and low for the RCNN output, then it is clear that, you don't have enough positive labels for the classification network to learn.
Case 2: When the recall of RPN is low and high for the RCNN output, then you might not have enough amount of training data and less number of classes.
Case 3: When both recalls are low, then try larger dataset as your model is already converging.
-- Experimenting with learning rate always helps.
-- Simple Hack : You can use multiple aspect ratios (near to your original aspect ratios) so that you can get more labels for training (Not sure how well it helps in your case).

Neural Network High Confidence Inaccurate Predictions

I have a trained a neural network on a classification task, and it is learning, although it's accuracy is not high. I am trying to figure out which test examples it is not confident about, so that I can gain some more insight into what is happening.
In order to do this, I decided to use the standard softmax probabilities in Tensorflow. To do this, I called tf.nn.softmax(logits), and used the probabilities provided here. I noticed that many times the probabilities were 99%, but the prediction was still wrong. As such, even when I only consider examples who have prediction probabilities higher than 99%, I get a poor accuracy, only 2-3 percent higher than my original accuracy.
Does anyone have any ideas as to why the network is so confident about wrong predictions? I am still new to deep learning, so am looking for some ideas to help me out.
Also, is using the softmax probabilities the right way to do determine confidence of predictions from a neural network? If not, is there a better way?
Thanks!
Edit: From the answer below, it seems like my network is just performing poorly. Is there another way to identify which predictions the network makes are likely to be wrong besides looking at the confidence (since the confidence doesn't seem to work well)?
Imagine your samples are split by a vertical line but you NN classifier learnt a horizontal line, in this case any prediction given by your classifier can only obtain 50% accuracy always. However NN will assign higher confidence to the samples which are further away from the horizontal line.
In short, when your model is doing poor classification higher confidence has little to none contribution to accuracy.
Suggestion: Check if the information you needed to do the correct classification are in the data then improve the overall accuracy first.

Tensorflow Estimator self repair training on overfitting

I'm having some learning experience on tensorflows estimator api. Doing some classification task on a small dataset with tensorflow's tf.contrib.learn.DNNClassifier (I know there is tf.estimator.DNNClassifier but I have to work on tensorflow 1.2) I get the accuracy graph on my test dataset. I wonder why there are these negative peaks.
I thought they could occur because of overfitting and self repairing. The next datapoint after the peak seems to have the same value as the point before.
I tried to look into the code to find any proof that estimator's train function has such a mechanism but did not find any.
So, is there such a mechanism or are there other possible explanations?
I don't think that the Estimator's train functions has any such mechanism.
Some possible theories:
Does your training restart anytime? Its possible that if you have some Estimated Moving Average (EMA) in your model, upon restart the moving average has to be recomputed.
Is your input data randomized? If not, its possible that a patch of input data is all misclassified, and again the EMA is possibly smoothing out.
This is pretty mysterious to me. If you do find out what the real issue is please do share!

Selecting tensorflow object detection API training hyper parameters

I am setting up an object detection pipeline based on recently released tensorflow object detection API. I am using the arXiv as guidance. I am looking to understand the below for training on my own dataset.
It is not clear how they selected the learning rate schedules and how that would change based on the number of GPUs available for training. How do the training rate schedule change based on number of GPU's available for training? The paper mentions 9 GPUs are used. How should I change the training rate if I only want to use 1 GPU?
The released sample training config file for Pascal VOC using Faster R-CNN has initial learning rate = 0.0001. This is 10x lower than what was published in the original Faster-RCNN paper. Is this due to an assumption on the number of GPU's available for training or due to a different reason?
When I start training from the COCO detection checkpoint, how should the training loss decrease? Looking at tensorboard, on my dataset training loss is low - between 0.8 to 1.2 per iteration (with batch size of 1). Below image shows the various losses from tensorboard. . Is this expected behavior?
For questions 1 and 2: our implementation differs in a few small details compared to the original paper and internally we train all of our detectors with asynchronous SGD with ~10 GPUs. Our learning rates are calibrated for this setting (which you will also have if you decide to train via Cloud ML Engine as in the Pets walkthrough). If you use another setting, you will have to do a bit of hyperparameter exploration. For a single GPU, leaving the learning rate alone probably won't hurt performance, but you may be able to get faster convergence by increasing it.
For question 3: Training losses decrease erratically and you can only see the decrease if you smooth the plots quite a bit over time. Moreover, it's hard to explicitly say how well you are doing with respect to eval metrics just by looking at the training losses. I recommend looking at the mAP plots over time as well as the image visualizations to really get an idea of whether your model has "lifted off".
Hope this helps.