Bounding box coordinates via video object detection - tensorflow

Is it possible to get the coordinates of the detected object through video object detection? I used OpenCV to change video to images and LabelImg to put the bounding box to them. For the output, I want to be able to read (for a video file) the coordinates of the bounding box so I could get the centre of the box.

Object Detection works on per image or per frame basis.
The basic object detection model takes image as input and gives back the detected bounding boxes.
Now, what you need to do is, once your model is trained, read a video, frame by frame, or by skippng 'n' frames, pass the frame or image to the object detector and get the coordinates from it and show it into the output video frame.
That is how object detection works for a video.
Please refer to below links for references :
https://github.com/tensorflow/models/tree/master/research/object_detection
https://www.edureka.co/blog/tensorflow-object-detection-tutorial/

Related

Mask R-CNN mask coordinates

Is it possible to get the results or coordinates of the mask detection or the bounding box surrounding the image? I am using Mask R-CNN from matterport and the visualization of the masks on the image works quite good, but I woukd like to save the coordinates.
I am not sure how you are using this model. But when you import their model and use the detect method (which is the straightforward way to use their model), then the coordinates are returned immediately.
See for an explanation of what is returned from model.detect this documentation.
In short, per image you get a dict and your coordinates will be the_dict["rois"][1].

Turi Create rescales and moves my object annotations coordinates

I created and merged an images SFrame with an Annotations SFrame.
I have verified that the coordinates of the annotation boxes matches the location of the features measured in Photoshop.
However the models I create are non-functional, so I explored the merged data set with
data['image_with_ground_truth'] =
tc.object_detector.util.draw_bounding_boxes(data['image'], data['annotations'])
and I find that all the annotations are squashed in the top-left corner in Turi Create despite them actually being widely distributed on the source image as in the second image. The annotations list column shows the coordinates get read correctly into TC, but are mapped badly into what the model sees as bounding boxes.
Where should I look to find the scaling problem in Turi Create??
the version of ml-annotate I was using output coordinates with different scale factors for each image in set, some close, some off by as much as 3.3x

Tensorflow object detection API how to add background class samples?

I am using tensorflow object detection API. I have two classes of interest. In the first trial, I got reasonable results, but I found it was easy to get false positive of both classes in the pure background images. These background images (i.e., images without any class bbx) have not been included in the training set.
How can I add them into the training set? It seems not work if I simply add samples without bbx.
Your goal is to add negative images to your training dataset to strength the background class (id 0 in the detection API). You can reach this with the VOC Pascal XML annotation format. In your XML file is the height and width of your image without object. Usually you label objects the coordinates and height and width of your object and object name is in the XML file. If you use labelImg you can generate a XML file corresponded to your negative image with the verify button. Also can Roboflow generates XML files with and without objects.

TensorFlow: Collecting my own training data set & Using that training dataset to find the location of object

I'm trying to collect my own training data set for the image detection (Recognition, yet). Right now, I have 4 classes and 750 images for each. Each images are just regular images of the each classes; however, some of images are blur or contain outside objects such as, different background or other factors (but nothing distinguishable stuff). Using that training data set, image recognition is really bad.
My question is,
1. Does the training image set needs to contain the object in various background/setting/environment (I believe not...)?
2. Lets just say training worked fairly accurately and I want to know the location of the object on the image. I figure there is no way I can find the location just using the image recognition, so if I use the bounding box, how/where in the code can I see the location of the bounding box?
Thank you in advance!
It is difficult to know in advance what features your programm will learn for each class. But then again, if your unseen images will be in the same background, the background will play no role. I would suggest data augmentation in training; randomly color distortion, random flipping, random cropping.
You can't see in the code where the bounding box is. You have to label/annotate them yourself first in your collected data, using a tool as LabelMe for example. Then comes learning the object detector.

If I resize images using Tensorflow Object Detection API, are the bboxes automatically resized too?

Tensorflow's Object Detection API has an option in the .config file to add an keep_aspect_ratio_resizer. If I resize my training data using this, will the corresponding bounding boxes be resized as well? If they don't match up then the network is seeing incorrect examples.
Yes, the boxes will be resized to be compatible with the images as well!