What is the difference between the Faster R-CNN and RPN & Fast R-CNN models offered in Detectron2 model zoo? - object-detection

I am trying to implement a pretrained model from the Detectron2 library for object detection and it seems that Faster R-CNN models outperform the RetinaNet models. However, when accessing the model zoo, I came across Faster R-CNN models and RPN Faster R-CNN models. I scoured the internet but I am struggling to find the difference between these models. Does not Faster R-CNN already use RPN?
Model Zoo: https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md
Detectron2 Model Zoo

You're right - Faster R-CNN already uses RPN.
But you're likely misreading the title of the other table. It is "RPN & Fast R-CNN".
Fast R-CNN is the predecessor of Faster R-CNN. It takes as input an entire image and a set of object proposals. These object proposals have to therefore be pre-computed which, in the original paper, was done using Selective Search.
Since the object proposal process is not part of the network architecture itself, it could use any other method, including an RPN. This is what you see in the Detectron2 model zoo - the pre-trained Fast R-CNN model uses an independently pre-trained RPN to generate the proposals. See the config that specifies separate proposal files as part of the dataset.

Related

Object detector with multiple datasets

I am interested in building a yolo detector with trained on multiple datasets where each dataset has it own detection head. It is a multi-task learning approach. I am not sure how to convert the yolo detector architecture to support multiple head.
I came across the following projects, however I need your help to implement similar approach.
https://github.com/xingyizhou/UniDet
https://link.springer.com/chapter/10.1007/978-981-16-6963-7_27
This approach has some difficulties. First, in article you sent they use two-stage detection model with separate classification "branches". In the same time YOLO is one-stage detector and is fullyconvolutional, that means there are no fullyconnected layers, and class predictions (1d) are taking from the whole 3d-tensor (see the image).
You can take a look at YOLO9000 paper, the model was trained on detection and classification datasets at the same time - only loss function was changing.

How to optimize a pre-trained TF2.0 model for inference

My goal is to optimize a pre-trained model from TFHub for inference. Therefore I would like to use an object detection model with multiple outputs:
https://tfhub.dev/tensorflow/ssd_mobilenet_v2/fpnlite_640x640/1
where the archive contains a SavedModel file
https://tfhub.dev/tensorflow/ssd_mobilenet_v2/fpnlite_640x640/1?tf-hub-format=compressed
I came across the methods optimize_for_inference and freeze_graph, but read on the following thread that is is no longer supported in TF2:
https://stackoverflow.com/a/56384808/11687201
So how is optimization for inference done with TF2?
The plan is to use this one of the pre-trained networks for transfer learning and use this network later on with a hardware accelerator, the converter for this hardware requires a frozen graph as input.

object detection api , coco model

I just started using the tensorflow api and trained few models. Suddenly i realised the name of coco model is different and the accuracy is also the poor like what is the main difference between the faster_rcnn_inception_resnet_v2_atrous_coco Vs faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco VS faster_rcnn_resnet50_coco?? why the terms atrous , low proposals , where in resnet 50 nothing is being used :
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md
The naming has to do with the respective submitted variations in the COCO competition and respective papers.
They are versions of the Faster RCNN which originally used the VGG-16 for feature extraction.
Not going too deep on this, ResNet Faster RCNN variation, as the name implies, uses the ResNet for Feature Extraction. Then atrous and low proposals are also variations of the model.
Atrous:
Atrous Region Proposal Network (ARPN) is proposed to explore object
contexts at multiple scales by sliding a set of atrous filters with
increasing dilation rates over the last convolutional feature map.
The low proposals, I'm not familiar from where it comes, but from the name, I would guess it just generates less proposals in the Region Proposal Network (RPN) thus being faster at inference time (as you can see in the table of the model zoo).

Customize MobileNet model architecture with Tensorflow Object Detection API

Tensorflow object detection API provides a number of pretrained object detection models to choose from. However, I would like to introduce modifications to the architecture of those models.
Particularly, I would like to make Faster RCNN into a more shallow network and use it to train my model. I want to gain in performance despite loss in accuracy. MobileNet is too inaccurate for my application.
Is it possible to achieve this without having to implement everything from scratch ?
Thank you.

Faster RCNN for TensorFlow

Has anyone implement the FRCNN for TensorFlow version?
I found some related repos as following:
Implement roi pool layer
Implement fast RCNN based on py-faster-rcnn repo
but for 1: assume the roi pooling layer works (I haven't tried), and there are something need to be implemented as following:
ROI data layer e.g. roidb.
Linear Regression e.g. SmoothL1Loss
ROI pool layer post-processing for end-to-end training which should convert the ROI pooling layer's results to feed into CNN for classifier.
For 2: em...., it seems based on py-faster-rcnn which based on Caffe to prepared pre-processing (e.g. roidb) and feed data into Tensorflow to train the model, it seems weird, so I may not tried it.
So what I want to know is that, will Tensorflow support Faster RCNN in the future?. If not, do I have any mis-understand which mentioned above? or has any repo or someone support that?
Tensorflow has just released an official Object Detection API here, that can be used for instance with their various slim models.
This API contains implementation of various Pipelines for Object Detection, including popular Faster RCNN, with their pre-trained models as well.