Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2 - tensorflow

I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO. This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked).
Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works. So my question is, why is it written that the required input is 300x300 if it works with other sizes too? Do I need to resize all the dataset to 300x300 before I label them? I know it does convolution on the input so i don't think the size really matters (fix me if im wrong). As I know, the convolution occoure until we reach the end of the input.
Thanks for helping!

If I understand correctly you are using TF Object Detection API.
A given model, as mobilenet-v2-ssd, contains 3 main blocks:
[prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]
When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.
if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training.
BTW:
in the config file of the training (https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config)
you can see the resize that was defined:
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
- the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]

Related

VGG19 .h5 file modfiying

I'm using pretrained VGG19 in my modified neural transfer code (Gatys algorithm), but my PC doesn't allow me to use input image in original size (original height is 2499 pix, but with 20GB RAM I can use it only 1000 pix maximum)
As I read, the solution for me will be decreasing batch_size. So, my question is - how can I modify VGG19 .h5 file to change batch_size inside it? Or maybe I can override batch_size of it in my code?
Assuming the pretrained model is defined on ImageNet, the maximum input data size for a single sample is 224*224.
If you try and pass a large input, it's possible your deep learning framework will reshape it into many images to be classified at once.
Resizing your input data to 224*224, you will run with a single image (batch size of 1).
You could make a custom implementation of your model to take larger input sizes. However sizing down to 224*224 generally gets good results, depending on the task.

Why we have target_size for DeepLab while CNN can accept any sizes?

I still have not understood a concept. One reason that we use fully convolutional layer at the end in a CNN network is to handle different images sizes during training. My question is that if this is the case why we always crop or squeeze images into squared sizes in the input section. Please do not say the question is repeated, we use squared images to make it easier, check pyramid pooling, and so on.
For example, Here's a link
DeepLab can accept any images with different sizes. But in its code, there is a target_size as (513). Now, if CNN can accept images with different sizes, why we need to use target_size. If this is for converting images into a standard format, why 513?
During training, we should specify batch size. What is our batch_size in this case: (5, None, None, None). Is it possible to have images with different sizes in a batch?
I read many posts and still, I am confused with these questions:
- How can we train a model on images with different sizes (imagine that sizes are standard)? I see some codes use a batch size of one. I think it is not a solution.
- Is there any snipped code that shows how can we define batches for a model like FCN to accept dataset with different sizes?
- In this paper: Here's a link my problem was explained but authors again resized images into squared format, if we can use batches comprises of images with different sizes why they proposed that idea of using squared images between 180 by 180 and 224 by 224.
Has DeepLab used this part: link to make images into a standard format? or for other reason?
width, height = image.size
resize_ratio = 1.0 * 513 / max(width, height)
target_size = (int(resize_ratio * width), int(resize_ratio * height))
I could not find the place of their code when they training the model on PASCAL dataset.
I expected to find a simple code for Keras or Tensorflow whereas it shows easily that we can apply a CNN model such as FCN or DeepLab for a dataset such as PASCAL VOC2012 (for Segmentation) with images of different sizes without any resizing or cropping. Still, I am looking.
Thank you for detail answers in advance. Please do not repeat answers like you can use batch size one, squared images are common and better, you can add black margins to the images, fully connected layer is the problem, you can use global max pooling, and so on. I am looking to find a code that works on images with different sizes.
I could not find the place of DeepLab model in TensorFlow GitHub where it accepts batches with different sizes?? here
Also in here FCN it is trained on COCO dataset with target_size of 320 by 320. Why? it should be any size for FCN.
Also, could one explain to me how can we have a batch of images with different sizes? Could we have an np array of different sized images? Batch = [5, none, none, 3] each of 5 with different sizes.
I also found another confusing part in semantic segmentation. Using Keras Augmentation we can not augment image with more than 4 channels. It means that using Keras augmentation, we can not train PASCAL dataset with 21 channels. ??

normalization image before using ssd_mobilenet

I try train ssd-mobilenet in my own dataset :
training image : 3400 with size :1600*1200
test set :800 with size :1600 *1200 tensorflow -gpu :1.13.1 gpu :4GB cuda 10.0 cudnn 7
object: road damage like aligator crack but after 197000 step my training loss cannot go down 2.
I have 2 questions
Should I normalize my training and set image before before using pretrained model like ssd_mobilenet?
If yes
Should I annotate images normalized or not ?
I need really helps. Thanks in advance
Should I normalize my training and set image before before using pretrained model like ssd_mobilenet?
No. Assuming you define your training pipeline correctly (see the examples in the TF Models repository), the Object detection API will take care of defining the appropriate image transformations (scaling, padding, normalization, etc) required in order to make the input compatible with the model.

define input_size by myself when I apply transfer learning in my own datasets

I'm using tensorflow for deep learning.
I want to try transfer learning in my own datasets, and I downloaded the inceptionv3 model from tensorflow's website. I also find a demo, but I find the model input_size is 299 * 299 *3. I want to define the input_size by myself. Because Keras's inception v3 model can define input_size by myself. Such as the input_size is 512 * 512 * 3.
I don't use the resize function.
I tried to do the following:
enter image description here
but I got the following error:
enter image description here
When I change it to be 299 * 299 * 3, the code runs normally.
You can't easily change the input size of a trained model. The trained model's weights only know how to process input of this size. If you want to use pre-trained weights, your best option is to resize your images to the expected size.
As far as InceptionV3 is considered, you can use any image size and tensorflow's preprocessing will take of the image size. Tensorflow's official inception module includes https://github.com/tensorflow/models/blob/master/research/inception/inception/image_processing.py in which you can specify the size of the image that you want to use. The model then can be retrained use this new size.

Retrain last inception or mobilenet layer to work with INPUT_SIZE 64x64 or 32x32

I want to retrain last inception or mobilenet layer so it would classify my own objects (about 5-15)
Also I want this to work with INPUT_SIZE == 64x64 or 32x32 (not 224 like for the default inception model)
I found some articles about retraining models:
https://hackernoon.com/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991
https://medium.com/#daj/creating-an-image-classifier-on-android-using-tensorflow-part-3-215d61cb5fcd
For mobilenet they say
the input image size, either '224', '192', '160', or '128'
so I can't train with 64 or 32 (it's bad) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py#L80
What about inception models? Can I somehow train models to work with small image input sizes (to get results faster)?
Objects which I want to classify from such small images will be already cropped from its parent image (for example from camera frames), it could be traffic/road signs cropped by fastest cascade classifiers (LBP/Haar) which were trained to detect everything that looks like sign's shapes/figures (triangle/rhombus,circle shapes)
So 64x64 images which fully include/contain only interested object should be enough for classification
No you still can, use the smallest option which would be 128. It will just scale your 32 or 64 image up, which is fine.
it's not possible for classificators
but it become possible for tensorflow object detection api (we can set any input size) https://github.com/tensorflow/models/tree/master/research/object_detection