Mobilenet SSD Input Image Size - tensorflow

I would like to train a Mobilenet SSD Model on a custom dataset.
I have looked into the workflow of retraining a model and noticed the image_resizer{} block in the config file:
https://github.com/tensorflow/models/blob/d6d0868209833e014074d6cb4f32558e7acf2a6d/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config#L43
Does the aspect ratio here have to be 1:1 like 300x300 or can I specify a custom ratio?
All my dataset images are 960x256 - so could I just input this size for height and width? Or do I need to resize all the images to have an aspect ratio of 1:1?

Choose the height and width, in the model file (as per your link), to be the shape of the input image at which you want your model to train and operate. The model will resize input images to the specified size, if it has to.
So this could be the size of your input images (if your hardware can train and operate a model at that size):
image_resizer {
fixed_shape_resizer {
height: 256
width: 960
}
}
The choice will depend on the size of the training images and the resources required to train (and use) that size of model.
I typically use 512x288 as this size model runs happily on a Raspberry Pi. I prepare training images, at a variety of scales, at exactly this size. So the image resizer does no work during training.
For inference, I input images at 1920x1080, so the image resizer scales them to 512x288 before they pass into the Mobilenet, maintaining the aspect ratio.
However, the aspect ratio is not important in my domain since such distortions occur naturally.
So yes, just use your training image dimensions.

If you keep is as it is the network will resize your input images to 300x300 regardless the actual size of your images. An other thing you can try is replace the image_resizer block with the below change:
image_resizer
{
keep_aspect_ratio_resizer
{
min_dimension: 600
max_dimension: 1024
}
}
,which will feed the network with your input images unshaped, in the range [min_dim, max_dim]. I dont know though if this will work cause i believe the ssd detector requires the input images to be resized as squares, meaning having same height and width e.g 224x224 or 128x128. You dont have to make any manual changes to your hard images.

Related

Does Tensorflow resize bounding boxes when training an object detection model?

I'm wondering about image resizing and then the intuitive bounding box resizing that would follow that.
For instance, when I use a 640x640 image in my dataset, and the model has a fixed_shape_resizer of 320x320, will the original bounding box be scaled down to match the smaller 320x320 size?
Yes, Tensorflow will automatically resize the bounding box to match the smaller input size.
Here is a Link to the code that changes the bounding box sizes.

EfficientDet D0 512x512 image size

what is the best image size should I use for training an EfficientDet D0 512x512 for object detection.
I have image size varying from 500x500 to 2000x2000 is this okay for training the EfficientDet D0 512x512?
In the appendix of https://arxiv.org/pdf/1911.09070.pdf, there is a section describing the image resolution for training.
Depending on the speed/performance trade-off, you may choose a smaller resolution for detecting your objects, like 640x640, or increase the resolution to 1024x1024 if you have smaller objects in size; regardless of the case, preserve the aspect ratio when you resize your image.

dimentions of images as an input of LeNet_5

I am still a beginner in deep learning, I am wondering is it necessary to have for the input images of a size equal to 32*32 (or X*X)? the dimensions of my images are 457*143.
Thank you.
If you want to implement a LeNet and train it from the scratch, you don't have to resize your images. However, if you want to do transfer learning, you'd better resize your images according to the image size of the dataset on which your neural net is already trained.

change the input image size for mobilenet_ssd using tensorflow

I am using tensorflow and tflite to detect object. The model I use is mobilenet_ssd (version 2) from https://github.com/tensorflow/models/tree/master/research/object_detection
the input image size for detection is fixed 300*300, which is hard-coded in the model.
I want to input 1280*720 image for detection, how to do this? I do not have the traing image dataset of resolution 1280*720. I only have pascal and coco dataset.
How to modify the model to accept 1280*720 image(do not scale the image) for detection?
To change the input size of the image, you need to redesign the anchor box position. Because the anchors are fixed to the input image resolution. Once you change the anchor positions to 720P, then the mobilenet can accept 720p as input.
The common practice is scaling the input image before feeding the data into TensorFlow / TensorFlow Lite.
Note: The image in the training data set aren't 300*300 originally. The original resolution may be bigger and non-square, and it's downscaled to 300*300. It means it's totally fine to downscale 1280*720 image to 300*300 image and it should work fine.
Do you mind to try scaling and see if it works?

How to not resize input image while running Tensorflow SSD's inference

From what I can understand from Single Shot Multibox Detector paper, it is a fully convolutional network. As such, it won't require the rescaling which tensorflow is doing (to 300x300) during inference. How can I remove this resizing during inference in tensorflow?
You can configure this in model config file
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
If you remove image_resizer it should work fine. But to answer your question: Why do you want to do remove resizing?
Removing resizing would seriously impact training time and performance. And since your training images and input images are resized by tensorflow model config, model could still 'see' it same, in case you were thinking about information loss. Also, SSD was trained on COCO and for aforementioned reasons of training time and performance, they thought of resizing them.
Though you could try following alternatives than resizing if for some reasons that is not what you want to do.
Multiple crops. For example AlexNet was originally trained on 5 different crops: center, top-left, top-right, bottom-left, bottom-right.
Random crops. Just take a number of random crops from the image and hope that the Neural Network will not be biased.
Resize and deform. Resize the image to a fixed size without considering the aspect ratio. This will deform the image contents but preserves but now you are sure that no content is cut.
Variable-sized Inputs. Do not crop and train the network on variable sized images, using something like Spatial Pyramid Pooling to extract a fixed size feature vector that can be used with fully connected layers.