I'm training an object detector using tensorflow and the faster_rcnn_inception_v2_coco model and am experiencing a lot of false positives when classifying on a video.
After some research I've figured out that I need to add negative images to the training process.
How do I add these to tfrecord files? I used the csv to tfrecord file code provided in the tutorial here.
Also it seems that ssd has a hard_example_miner in the config that allows to configure this behaviour but this doesn't seem to be the case for faster rcnn? Is there a way to achieve something similar on faster rcnn?
I was facing the same issue with faster RCNN, although you cannot actually use hard_example_miner with the faster RCNN model, you can add some background images, ie. images with no objects (Everything remains the same, except there is not object tag in the xml for that particular picture)
One more thing that actually worked wonders for me was using the imgaug library, you can augment the images and the bounding boxes using the same script. Try and increase the training data by 10 or 15 times, and then I would suggest you to train again to around 150000-200000 steps.
These two steps helped me reduce the number of false positives effectively.
Related
I am training pre-trained SSD on my custom dataset which is working fine with test images. But when a new image comes with no object(hard negative), it detects false positive. I was looking for a way for adding these hard negative examples in training but did not find an exact procedure. Could someone help me out here?
How do I add hard negative examples to training?
Do I have to create XML files/ bounding boxes for hard negative images?
How do I create tf records for these hard negative images?
Do I have to edit code files to generate tf records or config file?
I've followed an object detection tutorial from pythonprogramming.net to recognize a small robot (my custom object) based on the ssd_mobilenet_v1_coco model.
I've about 450 labelled images of my robot.
I used the official sample config for ssd_mobilenet_v1_coco, and only made the necessary changes like num_class = 1, and reduced the batch size to 7, and trained until I had a loss that was consistently between 1 and 2 (about 10000 epochs).
The problem is, the model detects everything it used to know from its pre-trained state as my small robot. So it identifies everything as being a robot even though they aren't.
I faced this issue before. And fixed it by adding images contains pre-trained objects as negative examples. Another way to fix it is training longer. If you do both that will fix the problem i think. And try increasing your dataset by the way (i was training with 6000 images).
I have used the tensorflow API to detect the Guinness harp using the process described here - https://pythonprogramming.net/introduction-use-tensorflow-object-detection-api-tutorial/.
I have mostly good results, whenever the logo is clear in the image it finds it nicely -
However, after retraining from a coco checkpoint, it still detects what I think are coco objects with a very high confidence rating i.e people, magazines. I cannot work out why this is is.
(see below)
I am using the faster_rcnn_inception_v2_coco.config found here - https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config
Training for more steps does not seem to help as the total loss averages out. The above screenshots were from 10,000 training steps. I am training on a cpu.
I am augmenting my training images using imgaug, and an example training image can be seen below ( i have included the debug bounding box around the target) -
However, if the training images were the problem, wouldn't the graph have trouble detecting the target altogether?
I had a similar issue recently, from what it somewhat looks like a case of underfitting, I tried multiple things to improve on the results.
The thing that worked for me was actually augmenting data using the library imgaug. You can augment the images as well as the bounding boxes using a simple script, try and increase the dataset by say 10/12 fold.
I would also suggest adding some background images, ie. images with no object, it was recommended by a few people in the tensorflow discussion in the issues.
Try and train the dataset again and monitor it using tensorboard. I think you will be able to reduce the number of false positives significantly.
I am training Tensorflow Object detection on Windows 10using faster_rcnn_inception_v2_coco as pretrained model. I'm on Windows 10, with tensorflow-gpu 1.6 on NVIDIA GeForce GTX 1080, CUDA 9.0 and CUDNN 7.0.
My dataset contain only one object, "Pistol", and 3000 images (2700 train set, 300 test set). The size of the images are from ~100x200 to ~800x600.
I trained this model for 55k iterations, where the mAP was ~0.8 and the TotalLoss seems converged to 0.001. But however, seeing the evaluation, that there are a lot of multiple bounding boxes on the same detected object (e.g. this and this), and lot of false positives (house detected as a pistol). For example, in this photo taked by me (blur filter was applied later), the model detect a person and a car as pistols, as well as the correct detection.
The dataset is uploaded here, together with the tfrecords and the label map.
I used this config file, where the only things that I changed are: num_classes to 1, the fine_tune_checkpoint, input_path and label_map_path for train and eval, and num_examples.
Since I thought that the multiple boxes are a non-max-suppression problem, I changed the score_threshold (line 73) from 0 to 0.01 and the iou_threshold (line 74) from 1 to 0.6. With the standard values the outcome was much worse than this.
What can I do to have a good detection? What should I change? Maybe I miss something about parameters tuning...
Thanks
I think that before diving into paramter tuning (i.e. the mentioned score_threshold) you will have to review your dataset.
I didn't check the entire dataset you shared but from a high level view the main problem I found is that most of the images are really small and with a highly variable aspect ratio.
In my opinion this enters in conflict with this part of your configuration file:
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
If take one of the images of your dataset and you manually apply that transformation you will see that the result is very noisy for small images and very deformed for many images that have a different aspect ratio.
I would highly recommend you to re-build your dataset with images with more definition and maybe try to preprocess the images with unusual aspect ration with padding, cropping or other strategies.
If you want to stick with the small images you'd have to at least change the min and max dimensions of the image_resizer but, from my experience, the biggest problem here is the dataset and I would invest the time in trying to fix that.
Pd.
I don't see the house false positive as a big problem if we consider that it's from a totally different domain of your dataset.
You could probably adjust the minium confidence to consider a detections as true positive and remove it.
If you take the current winner of COCO and feed it with strange images like from a cartoon you will see that it generates a lot of false positives.
So it's more like a problem with the current object detection approaches wich are not robust to domain changes.
A lot of people I see online have been running into the same issue using Tensorflow API. I think there are some inherent problems with the idea/process of using the pretrained models with custom classifier(s) at home. For example people want to use SSD Mobile or Faster RCNN Inception to detect objects like "Person w/ helmet," "pistol," or "tool box," etc. The general process is to feed in images of that object, but most of the time, no matter how many images...200 to 2000, you still end up with false positives when you go actually run it at your desk.
The object classifier works great when you show it the object in its own context, but you end up getting 99% match on every day items like your bedroom window, your desk, your computer monitor, keyboard, etc. People have mentioned the strategy of introducing negative images or soft images. I think the problem has to do with limited context in the images that most people use. The pretrained models were trained with over a dozen classifiers in many variety of environments like in one example could be a Car on the street. The CNN sees the car and then everything in that image that is not a car is a negative image which includes the street, buildings, sky, etc.. In another image, it can see a Bottle and everything in that image which includes desks, tables, windows, etc. I think the problem with training custom classifiers is that it is a negative image problem. Even if you have enough images of the object itself, there isn't enough data of that that same object in different contexts and backgrounds. So in a sense, there is not enough negative images even if conceptually you shouldn't need negative images. When you run the algorithm at home you get false positives all over the place identifying objects around your own room. I think the idea of transfer learning in this way is flawed. We just end up seeing a lot of great tutorials online of people identifying playing cards, Millenium Falcons, etc., but none of those models are deployable in the real world as they all would generate a bunch of false positives when it sees anything outside of its image pool. The best strategy would be to retrain the CNN from scratch with a multiple classifiers and add the desired ones in there as well. I suggest re-introducing a previous dataset from ImageNet or Pascal with 10-20 pre-existing classifiers and add your own ones and retrain it.
I used tensorflow object detection api with RFCN_resnet101 for little objects, but sometimes the detection result is not good, it will detect the object with offset, and sometimes it detects an object by mistake. Does anyone knows how to deal with it?
Debugging object detection can be tricky. I recommend checking the input data (does it make sense):
Do object bounding boxes get displayed correctly when overlayed with images?
Are you using pixel box coordinates (vs normalized) when preparing the training data?
Do you have boxes that are too small or outside the image boundary that cause tensor NaN errors?
Do you have images that are too large and cause CUDA out of memory errors?
Once you are satisfied with input data and able to successfully generate TF records files for training and evaluation. I recommend asking the following questions:
Are you training the network for sufficient number of global iterations (e.g. 200K) on preferrably multiple GPUs with a sufficiently large batch size?
Are you getting resonable detections when evaluating on a few images, e.g. by specifying the following config file:
eval_config: {
num_examples: 1000
num_visualizations: 16
min_score_threshold: 0.15
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 1
}
Here num_visualizations will create an images tab in tensorboard when you run eval.py script, and you'll be able to visualize detections and vary the IoU min_score_threshold.
Are you fine-tuning a pre-trained model, e.g. check to make sure you have
fine_tune_checkpoint: "/path/to/model.ckpt"
from_detection_checkpoint: true
Finally, the beauty of TensorFlow object detection API is that you can try different object detection models: Faster R-CNN, YOLO, SSD that have different speed-accuracy tradeoffs without much extra work. You may find a different object detector works better for your application.
why not use faster_rcnn_inception_resnet_v2_atrous_coco ... for small objects its my goto option