If I resize images using Tensorflow Object Detection API, are the bboxes automatically resized too? - tensorflow

Tensorflow's Object Detection API has an option in the .config file to add an keep_aspect_ratio_resizer. If I resize my training data using this, will the corresponding bounding boxes be resized as well? If they don't match up then the network is seeing incorrect examples.

Yes, the boxes will be resized to be compatible with the images as well!

Related

Resize images for object detection

I want to train images with mask RCNN and my understanding is that all the images need to be the same size. I also read that you can add "padding" to images so that you can retain the right aspect ration.
Does anyone know how to add padding to the images and resize?Does anyone have a code for that or an online tool which can do that?
Thanks
In opencv library, there are padding function that can add borders to your images.
also, resize function too.
refer to this webpage.

change the input image size for mobilenet_ssd using tensorflow

I am using tensorflow and tflite to detect object. The model I use is mobilenet_ssd (version 2) from https://github.com/tensorflow/models/tree/master/research/object_detection
the input image size for detection is fixed 300*300, which is hard-coded in the model.
I want to input 1280*720 image for detection, how to do this? I do not have the traing image dataset of resolution 1280*720. I only have pascal and coco dataset.
How to modify the model to accept 1280*720 image(do not scale the image) for detection?
To change the input size of the image, you need to redesign the anchor box position. Because the anchors are fixed to the input image resolution. Once you change the anchor positions to 720P, then the mobilenet can accept 720p as input.
The common practice is scaling the input image before feeding the data into TensorFlow / TensorFlow Lite.
Note: The image in the training data set aren't 300*300 originally. The original resolution may be bigger and non-square, and it's downscaled to 300*300. It means it's totally fine to downscale 1280*720 image to 300*300 image and it should work fine.
Do you mind to try scaling and see if it works?

Why put the whole image in a tfrecord file? Why not just crop according to the bounding-box and put the cropped object in the tfrecord file?

Why do we put the whole image in a tfrecord file? Why not just crop the image according to the bounding-box and put the cropped object in the tfrecord file? This should greatly reduce the size of that file.
Because you want to learn to detect where that object is in the image. In image classification, you would cut out the images as you proposed and the network would output "car" or "not car". In object detection, the network will output the bounding boxes for the objects along with the class. ("car is at x1-x2-y1-y2") It learns by having the whole picture with the bounding boxes for the loss function.

Tensorflow object detection API how to add background class samples?

I am using tensorflow object detection API. I have two classes of interest. In the first trial, I got reasonable results, but I found it was easy to get false positive of both classes in the pure background images. These background images (i.e., images without any class bbx) have not been included in the training set.
How can I add them into the training set? It seems not work if I simply add samples without bbx.
Your goal is to add negative images to your training dataset to strength the background class (id 0 in the detection API). You can reach this with the VOC Pascal XML annotation format. In your XML file is the height and width of your image without object. Usually you label objects the coordinates and height and width of your object and object name is in the XML file. If you use labelImg you can generate a XML file corresponded to your negative image with the verify button. Also can Roboflow generates XML files with and without objects.

TensorFlow: Collecting my own training data set & Using that training dataset to find the location of object

I'm trying to collect my own training data set for the image detection (Recognition, yet). Right now, I have 4 classes and 750 images for each. Each images are just regular images of the each classes; however, some of images are blur or contain outside objects such as, different background or other factors (but nothing distinguishable stuff). Using that training data set, image recognition is really bad.
My question is,
1. Does the training image set needs to contain the object in various background/setting/environment (I believe not...)?
2. Lets just say training worked fairly accurately and I want to know the location of the object on the image. I figure there is no way I can find the location just using the image recognition, so if I use the bounding box, how/where in the code can I see the location of the bounding box?
Thank you in advance!
It is difficult to know in advance what features your programm will learn for each class. But then again, if your unseen images will be in the same background, the background will play no role. I would suggest data augmentation in training; randomly color distortion, random flipping, random cropping.
You can't see in the code where the bounding box is. You have to label/annotate them yourself first in your collected data, using a tool as LabelMe for example. Then comes learning the object detector.