Ignore some class in train - tensorflow

I'm using tensor-flow models object detection for my use case, and I have some boxes/classes that I would like to ignore in the training process because the quality of them is not the best.
I don't want to delete the boxes area with black rectangle because that will change the image
and I don't want them to be a Negative example in the training process
Is there an easy way to do that?
I'm using tensorflow models object detection faster-RCNN implementation with PASCAL VOC data presentation

Related

Image Detection & Classification - general approach?

I'm trying to build a detection + classification model that will recognize an object in an image and classify it. Every image will contain at most 1 object among my 10 classes (i.e. same image cannot contains 2 classes). An image can, however, contain none of my classes/objects. I'm struggling with the general approach to this problem, especially due to the nature of my problem; my objects have different sizes. This is what I have tried:
Trained a classifier with images that only contains my objects/classes, i.e. every image is the object itself with background pre-removed. Now, since the objects/images have different shapes (aspect ratios) I had to reshape the images to the same size (destroying the aspect ratios). This would work just fine if my purpose was to only build a classifier, but since I also need to detect the objects, this didn't work so good.
The second approach was similar to (1), except that I didn't reshape the objects naively, but kept the aspect ratios by padding the image with 0 (black). This completely destroyed my classifiers ability to perform well (accuracy < 5%).
Mask RCNN - I followed this blogpost to try build a detector + classifier in the same model. The approach took forever and I wasn't sure it was the right approach. I even used external tools (RectLabel) to generate annotated image files containing information about the bounding boxes.
Question:
How should I approach this problem, on a general level:
Should I build 2 separate models? (One for detection/localization and one for classification?)
Should I be annotating my images using annotations file as in approach (3)?
Do I have to reshape my images at any stage?
Thanks,
PS. In all of my approaches, I augmented the images to generate ~500-1000 images per class.
To answer your questions:
No, you don't have to build two separate models. What you are describing is called Object detection, which is classification along with localization. There are many models which do this: Mask_RCNN, Yolo, Detectron, SSD, etc..
Yes, you do need to annotate your images for training a model for your custom classes. Each of the models mentioned above has needs a different way of annotation.
No, you don't need to do any image resizing. Most of the time it is done when the model loads the data for training or inference.
You are on the right track with trying MaskRCNN.
Other than MaskRCNN, you could also try Yolo. There is also an accompanying easy-to-use annotating tool Yolo-Mark.
If you go through this tutorial, you would understand what you care about.
How to train your own Object Detector with TensorFlow’s Object Detector API
The SSD model is small so that it would not take so much time for training.
There are some object detection models.
On RectLabel, you can save bounding boxes in the PASCAL VOC format.
You can export TFRecord for Tensorflow.
https://rectlabel.com/help#tf_record

Understanding exactly what the pretrained model does on the Tensorflow object detection API

I am trying to understand what I need from any pre-trained model used in the API regardless of any additional code found on the Tensorflow object detection API.
For example: ssd_mobilenet_v1_coco_2017_11_17, depending on what I have understood: it is a model that is already trained to detect objects (there is a classification to know the category of the object + Regression to bound the objects with rectangles and those rectangles are actually the x,y,w,h coordinates on the object).
How do we benefit from the regression output of that model (x,y,w,h coordinates) to use them in another model?
Let's assume we want to print out just the coordinates x,y,w,h of a detected object on an image without any need of the code of Tensorflow object detection API, how can we do that?
Certainly you can use the pretrained model provided in tensorflow object detection model zoo without installing object detection api. The alternative solution is to use opencv.
Opencv has provided both c++ and python api to call .pb models generated by tensorflow. Here is a nice tutorial.

Tensorflow object detection: why is the location in image affecting detection accuracy when using ssd mobilnet v1?

I'm training a model to detect meteors within a picture of the night sky and I have a fairly small dataset with about 85 images and each image is annotated with a bounding box. I'm using the transfer learning technique starting with the ssd_mobilenet_v1_coco_11_06_2017 checkpoint and Tensorflow 1.4. I'm resizing images to 600x600pixels during training. I'm using data augmentation in the pipeline configuration to randomly flip the images horizontally, vertically and rotate 90 deg. After 5000 steps, the model converges to a loss of about 0.3 and will detect meteors but it seems to matter where in the image the meteor is located. Do I have to train the model by giving examples of every possible location? I've attached a sample of a detection run where I tiled a meteor over the entire image and received various levels of detection (filtered to 50%). How can I improve this?detected meteors in image example
It could very well be your data and I think you are making a prudent move by improving the heterogeneity of your dataset, BUT it could also be your choice of model.
It is worth noting that ssd_mobilenet_v1_coco has the lowest COCO mAP relative to the other models in the TensorFlow Object Detection API model zoo. You aren't trying to detect a COCO object, but the mAP numbers are a reasonable aproximation for generic model accuracy.
At the highest possible level, the choice of model is largely a tradeoff between speed/accuracy. The model you chose, ssd_mobilenet_v1_coco, favors speed over accuracy. Consequently, I would reccomend you try one of the Faster RCNN models (e.g., faster_rcnn_inception_v2_coco) before you spend a signifigant amount of time preprocessing images.

What to expect from deep learning object detection on black and white pictures?

With TensorFlow, I want to train an object detection model with my own images based on ssd_inception_v2_coco model. The problem I have is that all my pictures are black and white. What performance can I expect? Should I try to colorize my B&W pictures first? Or at the opposite, should I try to retrain base network with images "uncolorized"? Are there general guidelines for B&W processing of images for deep learning object detection?
I wouldn't go through the trouble of colorizing if you are planning on using a pretrained model. I would expect that explicitly colorizing your images as a pre-processing step would help very little (if at all) since in theory the features that a colorizing network learns can also be learned by the detection network.
If you are planning on pretraining your detection network that was trained on an RGB dataset, make sure you either (i) replace the first convolution in the network with a convolutional layer that expects a single-channel input, or (ii) pad your image with two all-zero channels.
You may get slightly worse detection performance simply because you lose two thirds of the image's pixel information when using BW instead of RGB.

Using Tensorflow object detection API to detect objects and classify objects by color

I am able to use Tensorflow to train the model on my own dataset. For example, I have trained a model to only detect the safety helmet and the result is good.
My plan for next step is to classify the identified safety helmet by colors. But I still in search of methods.
I am wondering should I retrain the model with different label map like: [item1 red_helmet] [item2 blue_helmet] and label my training dataset respectively? Or is there any other tricky way to achieve the same outcome?
You already have the region of interest in the picture.
All you need is extract the helmets from the picture and pass the cropped images to openCV routine that can detect colours.
Thats it, you are done :)