training a network to do detection task - object-detection

I have approximately 100,000 images for doing detection task.
In average there are 10 target objects in each individual image. However, all the objects are not labeled. for example from 10 objects, 5 of them have boxes and others does not have any boxes. Do you think training the network with this data would be a good idea? or is it required to all the objects in the image have bounding boxes?

Related

What form should the output layer of a deep learning network look like for multi-object bounding box regression?

I am building a neural network on the back of Mobilenet SSD v2 and its specifically for bounding box regression. I have had a difficult time looking for clear resources indicating how the output of the model should be shaped. My data generally has 1-4 boxes present in any given image and I could simply concatenate so the output is Dense(16) but what about the instance when there are more than 4 objects present in the image. I am unsure how to handle a dynamic multi-object output layer, how can I do this, are there any detailed resources that can be shared?

Why am I getting multiple detection boxes on an image?

Hello I'm new to the object detection tensorflow area. I tagged my images with the labelimg program and then trained them, but among the results I got, I got multiple detections on an object. What can I do to prevent this?enter image description here
It is normal you get multiple detections with different score. Post process your result: use some score threshold, merge your detections if they are at the same position. Check out this guide

tensorflow object api faster_rcnn_resnet101 training image resizing

I am currently using the Tensorflow Object API to train my own classes. I am retraining using the faster_rcnn_resnet101_coco model.
To create the training data, I used RectLabel to put bounding boxes around objects in approx 100 images. Each image has approx 30 classes in them, for a total of 40 classes present in all the images.
My images are 1920 × 1080 in size. The images are produced by pulling random frames from videos of the objects I would like to detect.
My issue is that I am not getting any detections (Tensorboard is not showing any) and I think it is because the training images are being resized and the objects in the images are getting too small. I am using the default faster_rcnn_resnet101_coco.config file with no changes (except for locations to the data).
Would it be a good idea to perform a random crop of the images (instead of resizing as below) so as to keep the object size the same for training?
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
Could there be another issue I am overlooking?
I used to deal with a object detection problem,I got nothing at first.I trained the model for two more days ,I got the right results.
more training and more data may be helpful.
If you're worried that the resizing is making the objects to small to detect, you can use a larger input resolution. Theoretically you can do it only on your training data, but I'm not sure it would give good result with such tiny training set.
Instead, you can first fine-tune the pre-trained model with the same dataset (COCO?) on the larger input resolution, and only then fine-tune it on your training data with the larger resolution.
This way, the model will theoretically first learn to adapt to the larger resolution, and then will learn your classes.
I would also like to side with Friday2013 and suggest to get more training data, possibly more augmentation, and then more training time. Only training longer might not help if you still train on the same small amount of images, since you would get overfitting.

convolutional neural network image recognition

Currently I am working on a project with convolutional network using tensorflow and I have set up the network and now i need to train it. I don't have a clue of how could the image should be for training. Like how much of % of the image the object is training on.
It's a cigarette that I have to detect and I have tried around 280 individual pictures where the cigarette is about2-5% of the image. I'm thinking of scrapping those pictures and take new one where the cigarette is about 30-50% of the image.
All the pictures are taking outside on the street environment.
So my question is: are there are any kind of rule regarding good pictures in a training set?
I will report back when I have tried my own solution
The object you are trying to recognise is too small. In the Sample, I think first one will be the best bet for you. Convolution neural network works by doing convolution operations on image pixels. In the second picture, background is too large compared to the object you are trying to recongise. Training on such data will not help you.
Just trying to answer your rule question:
Make sure that cigarette occupies maximum region of the Image. It can be 50% to 90% (with experience). You can still identify cigarettes with 2 to 3 % area, but you need millions of images with varying backgrounds.
CNN learns from the input image. Looking at the sample images you shared (I guess all the images are taken from road side platforms and grass areas). CNN may not learn to find the cigarette, instead it will learn to detect the common background, if your background occupies maximum area of the image. Please make sure to keep different background patterns.

small object detection with faster-RCNN in tensorflow-models

I'm attempting to train a faster-rccn model for small digit detection. I'm using the newly released tensorflow object detection API and so far have been fine tuning a pre-trained faster_rcnn_resnet101_coco from the zoo. All my training attempts have resulted in models with high precision but low recall. Out of the ~120 objects (digits) on each image only ~20 objects are ever detected, but when detected the classification is accurate. (Also, I am able to train a simple convnet from scratch on my cropped images with high accuracy so the problem is in the detection aspect of the model.) Each digit is on average 60x30 in the original images (and probably about half that size after the image is resized before being fed into the model.) Here is an example image with detected boxes of what I'm seeing:
What is odd to me is how it is able to correctly detect neighboring digits but completely miss the rest that are very similar in terms of pixel dimensions.
I have tried adjusting the hyperparameters around anchor box generation and first_stage_max_proposals but nothing has improved the results so far. Here is an example config file I have used. What other hyperparameters should I try adjusting? Any other suggestions on how to diagnose the problem? Should I be looking into other architectures or does my task look doable with faster-rccn and/or SSD?
In the end the immediate problem was that I was not using the visualizer correctly. By updating the parameters for visualize_boxes_and_labels_on_image_array as described by Johnathan in the comments I was able to see that that I am at least detecting more boxes than I had thought.
I check your config gile, you are decreasing the resolution of your image to 1024. The region of your digit will not contain a lot of pixel and you are loosing some information. What I suggest is to train the model with an another dataset (smaller images). You can for example crop the images in 4 four area.
If you have a good GPU increase the max dimension in the image_resizer, but I guess you will run out of memory