Background images in one class object detection - tensorflow

When training a single class object detector in Tensorflow, I am trying to pass instances of images where no signal object exists, such that the model doesn't learn that every image contains at least one instance of that class. E.g. if my signal were cats, id want to pass pictures of other animals/landscapes as background -this could also reduce false positives.
I can see that a class id is reserved in the object detection API (0) for background, but I am unsure how to code this into the TFrecords for my background images - class could be 0 but what would be the bounding box coords? Or do i need a simpler classifier on top of this model to detect if there is a signal in the image or not, prior to detecting position?

Later approach of simple classifier, makes sense. I don't think there is a way to do the first part. You can use check on confidence score as well apart from checking the object is present.

It is good practice to create a dataset with not objects of interest, for the same you need to use the same tools (like - label img) that you have used for adding the boxes, image with no BB wil have xml files with no details of BB but only details of the image. The script create tf record will create the tf record from the xml files, look at the below links for more inforamtion -
Create tf record example -
https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pet_tf_record.py
Using your own dataset-
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md

Related

DeepLabV3, segmentation and classification/detection on coral

I am trying to use DeepLabV3 for image segmentation and object detection/classification on Coral.
I was able to sucessfully run the semantic_segmentation.py example using DeepLabV3 on the coral, but that only shows an image with an object segmented.
I see that it assigns labels to colors - how do i associate the labels.txt file that I made based off of the label info of the model to these colors? (how do i know which color corresponds to which label).
When I try to run the
engine = DetectionEngine(args.model)
using the deeplab model, I get the error
ValueError: Dectection model should have 4 output tensors!This model
has 1.
I guess this way is the wrong approach?
Thanks!
I believe you have reached out to us regarding the same query. I just wanted to paste the answer here for others to reference:
"The detection model usually have 4 output tensors to specifies the locations, classes, scores, and number and detections. You can read more about it here. In contrary, the segmentation model only have a single output tensor, so if you treat it the same way, you'll most likely segfault trying to access the wrong memory region. If you want to do all three tasks on the same image, my suggestion is to create 3 different engines and feed the image into each. The only problem with this is that each time you switch the model, there will likely be data transfer bottleneck for the model to get loaded onto the TPU. We have here an example on how you can run 2 models on a single TPU, you should be able to modify it to take 3 models."
On the last note, I just saw that you added:
how do i associate the labels.txt file that I made based off of the label info of the model to these colors
I just don't think this is something you can do for segmentation model but maybe I'm just confused on your query?
Take object detection model for example, there are 4 output tensors, the second tensor gives you an array of id associates with a certain class that you can map to a a label file. Segmentaion models only give the pixel surrounding an objects.
[EDIT]
Apology, looks like I'm the one confused on segmentation models.
Quote form my college :)
"You are interested to know the name of the label, you can find the corresponding integer to that label from result array in Semantic_segmentation.py. Where result is classification data of each pixel.
For example;
if you print result array in the with bird.jpg as input you would find few pixel's value as 3 which is corresponding 4th label in pascal_voc_segmentation_labels.txt (as indexing starts at 0 )."

Tensorflow Object Detection Unusually large bounding boxes and wrong results

I am building an object detector in TensorFlow to detect, motorbike riders with and without helmet, I have 1000 Images each for riders with helmet, withouthelmet and pedestrians(pu together -- 3000 IMAGES), My last checkpoint was 35267 steps, I have tested using a traffic video, but I see unusally large bounding boxes with wrong results. Can someone please explain the reason for such detections? Do I need to wait for atleast 50000 steps?? or Do I need to add datasets(Images in the angle to Traffic Cameras)?
Model - SSD Mobilenet COCO - Custom Object Detection,
Training Platform - Google Colab
Please find the Images attachedVideo Snapshot 1
Video Snapshot 2
Day 2 - 10/30/2018
I have tested with Images today, I have got different results, seems to be correct,2nd Day if I test with single object in a Image. Please find the results
Single Object IMage Test 1
Single Object Image Test 2
Tested CHeckpoint - 52,000 Steps
But, If I test with the Images with multiple objects in a road, the detection is wrong and bounding boxes are weirdly bigger, Is it because of the dataset, as I am training with One Motorbike rider(with or with out helmet) per image.
Please find the wrong results
Multi Object Image Test
Multi Object Image Test
I had also tested with images like all Motorbikes in the scene, In this case, I did not get any results, Please find the Images
No Result Image
No Result Image
The results are very confusing, Is there anything I am missing?,
There is no need to wait till 50000 epocs you should get decent result in 35k or even in 10k. I would suggest
go through you data-set again and check all the bounding boxes (data cleaning)
Check your model with inference code for changes like batch normalization etc
Add some more data with different features, angles and color complexities
I would check these points before going further.

Is the object location in train effect the results for Faster RCNN?

Has enyone try the effect of the location per class in faster rcnn?
In case my train data has one of the object classes always in one area of the frame, lets say in the top right of the image, and on the evaluation dataset I have one image that this object is on other area, down left,
Is the Faster RCNN capable to handle with this case?
Or if I want my network to find all of the classes in all of the frame areas I need to provide example in the train dataset that cover all the areas?
Quoting faster-RCNN paper:
An important property of our approach is that it is
translation invariant, both in terms of the anchors and the
functions that compute proposals relative to the anchors. If
one translates an object in an image, the proposal should
translate and the same function should be able to predict the
proposal in either location. This translation-invariant property
is guaranteed by our method*
*As is the case of FCNs [7], our network is translation invariant up to the network’s total stride
So the short answer is that you'll probably be ok with the object is mostly at a certain location in the train set and somewhere else in the test set.
A bit longer answer is that the location may have side affects that may affect the accuracy and it will probably be better to have the object in different locations; however you can try to add - for testing purposes - N test samples to the train set and see what is the accuracy change in the test set -N remaining samples.

Feature Pyramid Network with tensorflow/models/object_detection

If I want to implement k = k0 + log2(√(w*h)/224) in Feature Pyramid Networks for Object Detection, where and which file should I change?
Note, this formula is for ROI pooling. W and H are the width and height of ROI, whereas k represents the level of the feature pyramid this ROI should be used on.
*saying the FasterRCNN meta_architecture file of in object_detection might be helpful, but please inform me which method I can change.
Take a look at this document for a rough overview of the process. In a nutshell, you'll have to create a "FeatureExtractor" sub-class for you desired meta-architecture. For FasterRCNN, you can probably start with a copy of our Resnet101 Feature Extractor as a starting point.
The short answer is that the change won't be trivial as we don't currently support cropping regions from multiple layers. Here is an outline of what would need to change if you would like to pursue this anyway:
Generating a new anchor set
Currently Faster RCNN uses a “GridAnchorGenerator” as the first_stage_anchor_generator - instead you will have to use a MultipleGridAnchorGenerator (same as we use in SSD pipeline).
You will have to use a 32^2 anchor box -> for the scales field of the anchor generator, basically you will have to add a .125
You will have to modify the code to generate and crop from multiple layers: to start, look for a function in the faster_rcnn_meta_arch file called "_extract_rpn_feature_maps", which is suggestively named, but currently returns just a single tensor! You will also have to add some logic to determine which layer to crop from based on the size of the proposal (Eqn 1 from the paper)
You will have to finally create a new feature extractor following the directions that Derek linked to.

How to count objects detected in an image using Tensorflow?

I have trained my custom object detector using faster_rcnn_inception_v2, tested it using object_detection_tutorial.ipynb and it works perfect, I can find bounding boxes for the objects inside the test image, my problem is how can I actually count the number of those bounding boxes or simply I want to count the number of objects detected for each class.
Because of low reputations I can not comment.
As far as I know the object detection API unfortunately has no built-in function for this.
You have to write this function by yourself. I assume you run the eval.py for evaluation!? To access the individual detected objects for each image you have to follow the following chain of scripts:
eval.py -> evaluator.py ->object_detection_evaluation.py -> per_image_evaluation.py
In the last script you can count the detected objects and bounding boxes per image. You just have to save the numbers and sum them up over your entire dataset.
Does this already help you?
I solved this using Tensorflow Object Counting API. We have an example of counting objects in an image using single_image_object_counting.py. I just replaced ssd_mobilenet_v1_coco_2017_11_17 with my own model containing inference graph
input_video = "image.jpg"
detection_graph, category_index = backbone.set_model(MODEL_DIR)