I am training pre-trained SSD on my custom dataset which is working fine with test images. But when a new image comes with no object(hard negative), it detects false positive. I was looking for a way for adding these hard negative examples in training but did not find an exact procedure. Could someone help me out here?
How do I add hard negative examples to training?
Do I have to create XML files/ bounding boxes for hard negative images?
How do I create tf records for these hard negative images?
Do I have to edit code files to generate tf records or config file?
Related
I'm training an object detector using tensorflow and the faster_rcnn_inception_v2_coco model and am experiencing a lot of false positives when classifying on a video.
After some research I've figured out that I need to add negative images to the training process.
How do I add these to tfrecord files? I used the csv to tfrecord file code provided in the tutorial here.
Also it seems that ssd has a hard_example_miner in the config that allows to configure this behaviour but this doesn't seem to be the case for faster rcnn? Is there a way to achieve something similar on faster rcnn?
I was facing the same issue with faster RCNN, although you cannot actually use hard_example_miner with the faster RCNN model, you can add some background images, ie. images with no objects (Everything remains the same, except there is not object tag in the xml for that particular picture)
One more thing that actually worked wonders for me was using the imgaug library, you can augment the images and the bounding boxes using the same script. Try and increase the training data by 10 or 15 times, and then I would suggest you to train again to around 150000-200000 steps.
These two steps helped me reduce the number of false positives effectively.
Im currenty working on a project at University, where we are using python + tensorflow and keras to train an image object detector, to detect different parts of the root system of Arabidopsis.
Our current ressults are pretty bad, as we do only have about 100 images to train the model with at this moment, but we are currently working on cultuvating more plants in order to get more images(more data) to train the tensorflow model.
We have implemented the following Mask_RCNN model:Github- Mask_RCNN tensorflow
We are looking to detect three object clases: stem, main root and secondary root.
But the model detects main roots incorrectly where the secondary roots are located.
It should be able to detect something like this:Root detection example
Training root data set that we are using right now:training images
What is the usual sample size that is used to train a neural network accurate results?
First off: I think there is no simple rule to estimate the sample size but at least it depends on:
1. Quality of your images
I downloaded the images and I think you need to preprocess them before you can use it to reduce the "problem complexity". In some projects, in which I worked with biological data, a background removal (image - low pass filter) was the key to get better results. But you should definitely remove/crop the area outside the region of your interest (like the tape and the ruler). I would try to get the cleanest data set as possible (including manually adjustments cv2/ gimp/ etc.) to focus the network to solve "the right problem".. After that you could apply some random distortion to make it also work on fuzzy/bad/realistic images as well.
2. The way you work with your data
There are a few tricks that enables you to "expand" your dataset.
Sometimes it's very helpful to let a generator method crop random small patches from your input data. This allows you to work with more batches (on small gpus) and gives your network more "variety", (just think about the conv2d task: if you don't use random cropping your filters will slide over the same areas over and over again (at the same image)). Because of the same reason: apply random distortion, flip and rotate your images.
3. Network architecture
In your case I would prefer a U-Net architecture with a last conv2d output of 3 (your classes) feature maps, a final softmax activation and an categorical_crossentropy, this enables you to play with the depth, because sometimes you need sophisticated architectures to solve a problem (close to 100%) but in your case you just want to see a first working result. So fewer layers and a simple architecture could also help you to get things work. Maybe there are some trained network weights for a U-Net which meets your requirements (search on kaggle for example). Because it is also helpful (to reduce the data you need) to use "transfer learning" -> use the first layers of an network (weights) which is already trained. Using a semantic segmentation the first filters will become something like an edge detection for the most given problems/images.
4. Your mental model of "accurate results"
This is the hardest part.. because it evolves during your project. Eg. in the same moment your networks starts to perform well on preprocessed input images you will start to think about architecture/data changes to make it work on fuzzy images as well. This is why you should start with a feasible problem but always improve your dataset (including rare kinds of roots) and tune your network architecture step by step.
I have used the tensorflow API to detect the Guinness harp using the process described here - https://pythonprogramming.net/introduction-use-tensorflow-object-detection-api-tutorial/.
I have mostly good results, whenever the logo is clear in the image it finds it nicely -
However, after retraining from a coco checkpoint, it still detects what I think are coco objects with a very high confidence rating i.e people, magazines. I cannot work out why this is is.
(see below)
I am using the faster_rcnn_inception_v2_coco.config found here - https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config
Training for more steps does not seem to help as the total loss averages out. The above screenshots were from 10,000 training steps. I am training on a cpu.
I am augmenting my training images using imgaug, and an example training image can be seen below ( i have included the debug bounding box around the target) -
However, if the training images were the problem, wouldn't the graph have trouble detecting the target altogether?
I had a similar issue recently, from what it somewhat looks like a case of underfitting, I tried multiple things to improve on the results.
The thing that worked for me was actually augmenting data using the library imgaug. You can augment the images as well as the bounding boxes using a simple script, try and increase the dataset by say 10/12 fold.
I would also suggest adding some background images, ie. images with no object, it was recommended by a few people in the tensorflow discussion in the issues.
Try and train the dataset again and monitor it using tensorboard. I think you will be able to reduce the number of false positives significantly.
I'm attempting to train a faster-rccn model for small digit detection. I'm using the newly released tensorflow object detection API and so far have been fine tuning a pre-trained faster_rcnn_resnet101_coco from the zoo. All my training attempts have resulted in models with high precision but low recall. Out of the ~120 objects (digits) on each image only ~20 objects are ever detected, but when detected the classification is accurate. (Also, I am able to train a simple convnet from scratch on my cropped images with high accuracy so the problem is in the detection aspect of the model.) Each digit is on average 60x30 in the original images (and probably about half that size after the image is resized before being fed into the model.) Here is an example image with detected boxes of what I'm seeing:
What is odd to me is how it is able to correctly detect neighboring digits but completely miss the rest that are very similar in terms of pixel dimensions.
I have tried adjusting the hyperparameters around anchor box generation and first_stage_max_proposals but nothing has improved the results so far. Here is an example config file I have used. What other hyperparameters should I try adjusting? Any other suggestions on how to diagnose the problem? Should I be looking into other architectures or does my task look doable with faster-rccn and/or SSD?
In the end the immediate problem was that I was not using the visualizer correctly. By updating the parameters for visualize_boxes_and_labels_on_image_array as described by Johnathan in the comments I was able to see that that I am at least detecting more boxes than I had thought.
I check your config gile, you are decreasing the resolution of your image to 1024. The region of your digit will not contain a lot of pixel and you are loosing some information. What I suggest is to train the model with an another dataset (smaller images). You can for example crop the images in 4 four area.
If you have a good GPU increase the max dimension in the image_resizer, but I guess you will run out of memory
There is a set of images, each of which contains different shape entities, such as shown in the following figure. I am trying to localize and recognize these different shapes. For instance, adding a bounding box for each different shape and maybe even label it. What are the major research papers/deep learning models that have been able to solve this kind of problem?
Object detection papers such as rcnn, faster rcnn, yolo and ssd would help you solve this if you were bent on using a deep learning approach.
It’s easy to say this is a trivial problem that can be solved with tools in OpenCV and deep learning is overkill, but I can see many reasons to use deep learning tools and that does not answer your question.
We assume that your shapes has different scales and rotations. Actually your main image shown above is very large for training process and it needs a lot of training samples to generate a good accuracy at the end on test samples. In this case it is better to train a Convolutional Neural Network on a short images (like 128x128) with only one shape per each image and then use slide trick!
This project will have three main steps:
Generate test and train samples, each image should have only one shape
Train a classifier to recognize a single shape within each input image
Use slide trick! Break your original image containing many shapes to overlapping blocks of size 128x128. Pass each block to your model trained in the second step.
In this way at the end you will have label for each shape from your trained model, and also you will have location of each shape using slide trick.
For the classifier you can use exactly CNN structure of Tensorflow's MNIST tutorial.
Here is a paper with exactly same method applied to finger print images to extract local features.
A direct fingerprint minutiae extraction approach based on convolutional neural networks