How do I classify an image into multiple classes? - object-detection

I wish to perform object detection on images. I am using Convolutional Neural Network (CNN) to perform object detection on images. My problem is, I want to draw bounding boxes for a cat in each image. In addition, I also wanted to give a tag 'white', 'black', 'other' to the detected boxes.
How do I add these tags to the detected objects?

Related

Weakly supervised object detection R-CNN of screen images

I have a set of icons and a screen recording, the icons are not annotated and have no bounding boxes, they are just png icons with image-level labels eg: "instagram", "facebook", "chrome".
The task is to find the icons within the screen recording and draw a bounding box around them, given the above prerequistes.
My idea of approach so far is:
Use selective search to find ROIs
Use a CNN to classify the regions
Filter out non-icons regions
Draw bounding boxes around positive labelled ROIs
Use resulting screen images with bounding boxes to train a FAST R-CNN
but I am stuck at step 2, I have no idea on how to train the CNN with the image-level labelled icons.
If I make a dataset of all the possible icon-images, with no background or context informations, is it possible to train the CNN to answer the question "Does the ROI includes a possible icon?"

Does Tensorflow object detection API accepts data with 3D bounding boxes?

I have data labeled with 3D bounding boxes (KITTI object detection 3D), and I want to apply one of the Tensorflow object detection models (such as Faster R-CNN) to perform object detection. So far, All of the examples I saw are using 2D Bounding boxes. Does this mean that a 3D box is not allowed by such API?
If NOT, does Anyone has an example using a 3D bounding box example model?

Is it possible to use polygon data annotation to perform tensorflow object detection?

My problem is not exactly annotate data using polygon, circle or line, it's how to use these annotated data to gerenate a ".tfrecord" file and perform an object detection. The tutorials I saw use rectangle annotation, like these: taylor swift detection raccon detection
It would be a great one for me if the objects I want to detect (pipelines) were not too close.
Example of rectangle drawn in PASCAL VOC format:
<bndbox>
<xmin>82</xmin>
<xmax>172</xmax>
<ymin>108</ymin>
<ymax>146</ymax>
</bndbox>
Is there a way to add a "mask" to highlight some part of this bounding box?
If it's something unclear, please let me know.
You can go for instance segmentation instead of object detection if your objects are very close to each other, there you can use polygons to generate masks and bounding boxes to train the model.
Consider this well presented and easy to use repository for mask-rcnn(kind of instance segmentation)
https://github.com/matterport/Mask_RCNN
check this for lite weight mask-rcnn

Object detection when the object occupies the full region on the image?

I am working with object detection using Tensorflow. I have mix of 7-8 classes. Initially, we had an image classification model and now moving it to object detection model. For once class alone, the object to be detected occupies the entire image. Can we have the bounding box dimension to be the entire width and height of the image? Will it hinder the performance?
It shouldn't hinder the performance as long as there's enough such examples in the training set.
the OD API clips detections outbounding the image, so in these cases the resulting bounding box would be the of the entire image (or one axis would be the entire size, and the other less, depending on the object occupation).
Assuming your OD model uses anchors, make sure you have anchors which are responsible for such cases (i.e. with scale of about the entire image).

Tensorflow gouping detection boxes for stacked/near by images

I am doing an object detection on sample toy sets using tensorflow RCNN on my own dataset.
Sometimes when two toys are closed to each other, the tensor flow is bounding both boxes together. So instead of returning two bounding boxes, tensorflow returning one bigger bounding box.
(as in the attached image both red and blue cars have been bounded together)
Does anyone know what is the issue here and how to do individual identification?
Detected Output image