How to train the bigger version of SSD(600x600?) in tensorflow object detection api? - tensorflow

The given config files for the ssd models has 300x300 as input sizes.
I would like to train the model with the bigger version to try to get a better accuracy as was said in the paper. How do I do this? Do I simply change the values in the config file or is there a specific way to do this?

Yeap u just have to modify that parameter before training.

Related

Image Detector with tensorflow

I want to build a simple image detector for custom Binary shapes on images.
I may train and use the models on object detection zoo such as ssd_inception_v2 and so on. But it's would be extremely un efficient as it has sizes in hundreds of Megabytes.
and I can't even imagine to use that in my simple app. can anybody suggest me how to solve this?
I have already built excellent small size classifiers for my images. but can't build small scale efficient detector. (their position with detection boxes)
I think what you need is transfer learning. I would take one of the lightweight models such as MobileNetV2 and retrain on my dataset. It should be pretty quick.If you want to even decrease your model size further, feel free to only take the first few layers of the CNN and retrain it. It would be a bit more work since you need to re-write the part of network you want to use and load it with the pre-trained weights.

Faster RCNN + inception v2 input size

What is the input size of faster RCNN RPN?
I'm using an object detection API of Tensorflow which is using faster RCNN as region proposal network ( RPN ) and Inception as feature extractor ( according to the config file ). The API is using the online approach in prediction phase and detects every input image singly. however, I'm now trying to feed images to the network in the batch manner by use of Tensorflow dataset API.
as you know for make batch out of the data, firstly we need to resize all of the images to a same size. I think the best way of resizing the images is to resize them exactly to the input size of faster RCNN to avoid duplicate resizing. Now my question is what is the input size of the faster RCNN RPN?
thanks in advance
It depends on the input resolution which was specified in the pipeline config file, in image_resizer.
For example, for Faster R-CNN over InceptionV2 trained on COCO dataset, see this config file.
The specified resolution is 600x1024.
On a side note, fully convolutional architectures (such as RFCN, SSD, YOLO) don't restrict to a single resolution, i.e. you can apply them on different input resolution without modifying the architecture.
But this doesn't mean that the model will be robust to it if you're training on a single resolution.

Tensorflow RGB-D Training

I have RGB-D (color&depth) images for given scene. I would like to use tensorflow to train a classification model based on pre-trained network such as inception. As far as I understood, these pre-trained models were built using 3-channel RGB images. However, the inclusion of 4th channel cannot be handled.
How do I use RGB-D images directly? Do I need to pre-process the images, and seperate RGB and D, if so, how do I use the D (1-channel) alone?
Thank you!
If you want to use a pre-trained model you can only use RGB, as they were only trained to understand RGB. In this case, it is as you said: separate them and discard depth.
To use a 4 channel image like this you would need to retrain the network from scratch rather than loading a pre-trained set of weights.
You will probably get good results using the same architecture as is used for 3 channel images (save for the minor change required to support the 4 channel input), so retraining shouldn't be terribly hard.

TF Implementation of Fast-Rcnn. Why can't I get the same result as caffe?

I am trying to reimplement on tensorflow the Fast-Rcnn network that is already implemented in caffe, in order to use in Face/License Plates detection.
For that purpose, I converted the caffe weights into npy thanks using this script.
Here is how I present my model. To which I load the converted weights.
PS: I used the roi_pooling implementation by zplizzi.
Does anyone have any idea why I wouldn't get the same result testing same images with same Selective Search bboxes ? I was thinking it might be the flattening process that could differs from caffe to TF, maybe ?
*Edit:
Here is an example of results I get in caffe. While I get no car detection in TF.

how can use torch model?

I have a Torch Model which is trained on a large scale dataset (Places Dataset) and it's authors uploaded it on github, i am working on a similar project and i want to make use of it and use it's trained weights instead of use the large dataset to train it and save time and efforts, it is possible ? how can i know the only the trained filters weights? i don't want to copy the code, i only want to make use of it and save time and efforts.
NOTE: I use Tensoflow in my implementation.