Tensorflow Object Detection API Evaluation mAP randomly goes to 0 - tensorflow

I am using the Tensorflow Object Detection API to train a SSDLite (MobileNet V2) object detection model. During the course of training, the evaluation results spontaneously go to 0 at unpredictable parts of the training (in pic below after ~35k iterations). Notably, the training loss is not affected, which makes me think this is not just an exploding gradient problem.
EDIT: The issue only seems to happen when I use larger input image resolution, in this case 640x480. If I keep the original 300x300 input resolution, everything works correctly.
Here is a link to my config file in case that helps.
Any help would be greatly appreciated. Thanks!


Is there any way to resize input shapes in SNPE (dlc)?

I have trained a model based on Tensorflow. This model is supposed to work on the mobile phone but I have got a problem when converting froze graph (pb) to deep learning container(dlc). I have to set the input size to be constant. This cause that model can't work with any input size.
I am trying to find a way that resizes input shapes of a DLC model without initializing model with "snpe-tensorflow-to-dlc --input_dims 1,512,512,3" because this way is consuming.
Actually, I want to resize input shapes in dlc model. can anybody help me?
Usually deployment solutions work with fixed input shapes because they assume some widely acknowledged usage model - resize all picture of the same certain size and do inference. And due to this usage model, developers of deployment solutions do not prioritize model loading time while they usually prioritize the inference time. The same happens in SNPE, in OpenVINO, in TFLite, etc.
To illustrate the times, here is some results from Snapdragon 820. To load Inception v3 to CPU takes 715ms, to load model to DSP takes 3 seconds. Inference on CPU takes 1 sec, inference on DSP takes 100ms. You see that loading time on DSP is bigger than on CPU, but inference time is much much better.
At the same time, usually it is allowed to change a shape before loading of the model assuming that all input pictures will have different size (but again, same for all pictures) than shapes for which model was trained. For SNPE it is SNPEBuilder::setInputDimensions
If model allow to do reshape and if no bugs in SNPE implementation, the model can be reshaped and loaded.
Not sure if your usage model fits to the vision described in the first paragraph. At the same time, to have a benefit from different input size you need to develop special topology that unlikely be supported by SNPE. If you take just regular SSD and reshape it to different size and measure accuracy on validation set, the most likely you get the best result on shpaes where model was trained.

Issue with Custom object detection using tensorflow when Training on a single type of object

I am training a pre built tensorflow based model for custom object detection.
I want to detect only 1 type of object. I have taken lot of images from different angles and in different light conditions. I am training on K80 Nvidia GPU. Everything is working and when I train I can see the loss function falling to 0.3. But the loss values drops very quickly to under 1 when I start training. I am using SSD mobile Net as the base configuration for the model. When I try to test the model, it just draws a big square on the input image, rather than detecting the desired object in the image. Basically, it fails to detect the object.
I tried to train the model with a different set of images of mac n chesse which had lot of variations. Then the model worked fine and detected images of mac n chesse in the input image. But when I have pictures of single object then the model fails to detect. Please help me understand what I am doing wrong here
The issue was with my training dataset. I was not properly cropping the object from the original image. Also I needed around 300 images to properly train the model. SSD worked well after giving a well cropped images.

Tensorflow Object Detection API - What's actually test.record being used for?

I have a few doubts about Tensorflow Object Detection API. Hopefully someone can help me out... Before that, I need to mention that I am following what sendex is doing. So basically, the steps are come from him.
First doubt: Why we need test.record for training? What it does during training?
Second doubt: Sendex is getting images from test.record to test the newly trained model, doesn't the model already knew that images because they are from test.record?
Third doubt: In what type of occasion we need to activate drop_out (in the .config file)?
1) It does nothing during training, you dont need that during training, but at certain time the model begins to overfit. It means the loss on training images continues to go down but the accuracy on testing images stops improving and begins to decline. This is the time when it is needed to stop traininga nd to recognise this moment you need the test.record.
2) Images were used only to evaluate model during training not to train the net.
3) You do not need to activate it, but using dropout you usually achieve higher accuracy. It prevents the net from overfitting.

tensorflow object detection api for object detection, but the result is not good

I used tensorflow object detection api with RFCN_resnet101 for little objects, but sometimes the detection result is not good, it will detect the object with offset, and sometimes it detects an object by mistake. Does anyone knows how to deal with it?
Debugging object detection can be tricky. I recommend checking the input data (does it make sense):
Do object bounding boxes get displayed correctly when overlayed with images?
Are you using pixel box coordinates (vs normalized) when preparing the training data?
Do you have boxes that are too small or outside the image boundary that cause tensor NaN errors?
Do you have images that are too large and cause CUDA out of memory errors?
Once you are satisfied with input data and able to successfully generate TF records files for training and evaluation. I recommend asking the following questions:
Are you training the network for sufficient number of global iterations (e.g. 200K) on preferrably multiple GPUs with a sufficiently large batch size?
Are you getting resonable detections when evaluating on a few images, e.g. by specifying the following config file:
eval_config: {
num_examples: 1000
num_visualizations: 16
min_score_threshold: 0.15
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 1
Here num_visualizations will create an images tab in tensorboard when you run eval.py script, and you'll be able to visualize detections and vary the IoU min_score_threshold.
Are you fine-tuning a pre-trained model, e.g. check to make sure you have
fine_tune_checkpoint: "/path/to/model.ckpt"
from_detection_checkpoint: true
Finally, the beauty of TensorFlow object detection API is that you can try different object detection models: Faster R-CNN, YOLO, SSD that have different speed-accuracy tradeoffs without much extra work. You may find a different object detector works better for your application.
why not use faster_rcnn_inception_resnet_v2_atrous_coco ... for small objects its my goto option

small object detection with faster-RCNN in tensorflow-models

I'm attempting to train a faster-rccn model for small digit detection. I'm using the newly released tensorflow object detection API and so far have been fine tuning a pre-trained faster_rcnn_resnet101_coco from the zoo. All my training attempts have resulted in models with high precision but low recall. Out of the ~120 objects (digits) on each image only ~20 objects are ever detected, but when detected the classification is accurate. (Also, I am able to train a simple convnet from scratch on my cropped images with high accuracy so the problem is in the detection aspect of the model.) Each digit is on average 60x30 in the original images (and probably about half that size after the image is resized before being fed into the model.) Here is an example image with detected boxes of what I'm seeing:
What is odd to me is how it is able to correctly detect neighboring digits but completely miss the rest that are very similar in terms of pixel dimensions.
I have tried adjusting the hyperparameters around anchor box generation and first_stage_max_proposals but nothing has improved the results so far. Here is an example config file I have used. What other hyperparameters should I try adjusting? Any other suggestions on how to diagnose the problem? Should I be looking into other architectures or does my task look doable with faster-rccn and/or SSD?
In the end the immediate problem was that I was not using the visualizer correctly. By updating the parameters for visualize_boxes_and_labels_on_image_array as described by Johnathan in the comments I was able to see that that I am at least detecting more boxes than I had thought.
I check your config gile, you are decreasing the resolution of your image to 1024. The region of your digit will not contain a lot of pixel and you are loosing some information. What I suggest is to train the model with an another dataset (smaller images). You can for example crop the images in 4 four area.
If you have a good GPU increase the max dimension in the image_resizer, but I guess you will run out of memory