Faster RCNN + inception v2 input size - tensorflow

What is the input size of faster RCNN RPN?
I'm using an object detection API of Tensorflow which is using faster RCNN as region proposal network ( RPN ) and Inception as feature extractor ( according to the config file ). The API is using the online approach in prediction phase and detects every input image singly. however, I'm now trying to feed images to the network in the batch manner by use of Tensorflow dataset API.
as you know for make batch out of the data, firstly we need to resize all of the images to a same size. I think the best way of resizing the images is to resize them exactly to the input size of faster RCNN to avoid duplicate resizing. Now my question is what is the input size of the faster RCNN RPN?
thanks in advance

It depends on the input resolution which was specified in the pipeline config file, in image_resizer.
For example, for Faster R-CNN over InceptionV2 trained on COCO dataset, see this config file.
The specified resolution is 600x1024.
On a side note, fully convolutional architectures (such as RFCN, SSD, YOLO) don't restrict to a single resolution, i.e. you can apply them on different input resolution without modifying the architecture.
But this doesn't mean that the model will be robust to it if you're training on a single resolution.

Related

How to train large datasets in Colab free

I have to train 70,000 images for my face verification project on google colab free.
First, it gets stuck on 1st epoch and then even if it starts training, after sometime it throws out of RAM error.
The code I use is:
<https://nbviewer.org/github/nicknochnack/FaceRecognition/blob/main/Facial%20Verification%20with%20a%20Siamese%20Network%20-%20Final.ipynb>
If I've to make mini-batches of my dataset to fit it in the colab's GPU memory, then how can I do it?
Also, I want to train the whole dataset because it contains the images of 5 different people as anchors and positives.
You can do following options to train larger datasets.
Add more pooling layers in model.
Lower input size in your model.
Use Binary Format of images with lower image size for image classification models.
Lower the batch size while training and validating your model.
You can also use tf.data api to do various operations like batching , slicing , processing, shuffling etc to create a data pipeline. You can constrain GPU usage further to avoid Out of memory issues.
Attaching sample colab notebook below. https://colab.sandbox.google.com/github/tensorflow/docs/blob/master/site/en/guide/data.ipynb

Is there any way to resize input shapes in SNPE (dlc)?

I have trained a model based on Tensorflow. This model is supposed to work on the mobile phone but I have got a problem when converting froze graph (pb) to deep learning container(dlc). I have to set the input size to be constant. This cause that model can't work with any input size.
I am trying to find a way that resizes input shapes of a DLC model without initializing model with "snpe-tensorflow-to-dlc --input_dims 1,512,512,3" because this way is consuming.
Actually, I want to resize input shapes in dlc model. can anybody help me?
Usually deployment solutions work with fixed input shapes because they assume some widely acknowledged usage model - resize all picture of the same certain size and do inference. And due to this usage model, developers of deployment solutions do not prioritize model loading time while they usually prioritize the inference time. The same happens in SNPE, in OpenVINO, in TFLite, etc.
To illustrate the times, here is some results from Snapdragon 820. To load Inception v3 to CPU takes 715ms, to load model to DSP takes 3 seconds. Inference on CPU takes 1 sec, inference on DSP takes 100ms. You see that loading time on DSP is bigger than on CPU, but inference time is much much better.
At the same time, usually it is allowed to change a shape before loading of the model assuming that all input pictures will have different size (but again, same for all pictures) than shapes for which model was trained. For SNPE it is SNPEBuilder::setInputDimensions
If model allow to do reshape and if no bugs in SNPE implementation, the model can be reshaped and loaded.
Not sure if your usage model fits to the vision described in the first paragraph. At the same time, to have a benefit from different input size you need to develop special topology that unlikely be supported by SNPE. If you take just regular SSD and reshape it to different size and measure accuracy on validation set, the most likely you get the best result on shpaes where model was trained.

production - What is the best way to load a file for fast computation?

I'm deploying a deep learning model and saved the keras model as .h5 file. I think complex model will make it big in size and hence slow interaction at the server, but is there a way other than reducing the layers in the model that I can do? Is there a sort of compressing the .h5 file in order to load it faster for the server?
Thank you
There is a way to do that.
What you are looking for is called quantization.
Not necessarily reducing the layers which is equivalent to model-pruning, quantization reduces both the size and the latency of the model by modifying the precision of the weights (or even activations in some cases).
For more detailed information, read this page on the official TensorFlow documentation: https://www.tensorflow.org/lite/performance/post_training_quantization

Large input image limitations for VGG19 transfer learning

I'm using the Tensorflow (using the Keras API) in Python 3.0. I'm using the VGG19 pre-trained network to perform style transfer on an Nvidia RTX 2070.
The largest input image that I have is 4500x4500 pixels (I have removed the fully-connected layers in the VGG19 to allow for a fully-convolutional network that handles arbitrary image sizes.) If it helps, my batch size is just 1 image at a time currently.
1.) Is there an option for parallelizing the evaluation of the model on the image input given that I am not training the model, but just passing data through the pre-trained model?
2.) Is there any increase in capacity for handling larger images in going from 1 GPU to 2 GPUs? Is there a way for the memory to be shared across the GPUs?
I'm unsure if larger images make my GPU compute-bound or memory-bound. I'm speculating that it's a compute issue, which is what started my search for parallel CNN evaluation discussions. I've seen some papers on tiling methods that seem to allow for larger images

Tensorflow object detection: why is the location in image affecting detection accuracy when using ssd mobilnet v1?

I'm training a model to detect meteors within a picture of the night sky and I have a fairly small dataset with about 85 images and each image is annotated with a bounding box. I'm using the transfer learning technique starting with the ssd_mobilenet_v1_coco_11_06_2017 checkpoint and Tensorflow 1.4. I'm resizing images to 600x600pixels during training. I'm using data augmentation in the pipeline configuration to randomly flip the images horizontally, vertically and rotate 90 deg. After 5000 steps, the model converges to a loss of about 0.3 and will detect meteors but it seems to matter where in the image the meteor is located. Do I have to train the model by giving examples of every possible location? I've attached a sample of a detection run where I tiled a meteor over the entire image and received various levels of detection (filtered to 50%). How can I improve this?detected meteors in image example
It could very well be your data and I think you are making a prudent move by improving the heterogeneity of your dataset, BUT it could also be your choice of model.
It is worth noting that ssd_mobilenet_v1_coco has the lowest COCO mAP relative to the other models in the TensorFlow Object Detection API model zoo. You aren't trying to detect a COCO object, but the mAP numbers are a reasonable aproximation for generic model accuracy.
At the highest possible level, the choice of model is largely a tradeoff between speed/accuracy. The model you chose, ssd_mobilenet_v1_coco, favors speed over accuracy. Consequently, I would reccomend you try one of the Faster RCNN models (e.g., faster_rcnn_inception_v2_coco) before you spend a signifigant amount of time preprocessing images.