What does image_resizer mean in the Tensorflow Faster RCNN config file - object-detection

When training a model using Tensorflow Faster RCNN, what will the image_resizer do to the input image?
Supposing the image_resizer in the Faster_RCNN config file is set as
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 1000
max_dimension: 1000
}
}
I have one input image A.jpg with 1000*1000 pixels, then I do data augmentation by resizing (enlarging) this image by a ratio of 1.2 using a third party tool, which gives me another image B.jpg with 1200*1200 pixels.
When these two images are fed into the Faster RCNN model, what will the image resizer do to A.jpg and B.jpg? If I have understood correctly, A.jpg is kept as it is, B.jpg will be resized to 1000*1000, which means the resized B.jpg will be exactly the same image as A.jpg?! So, this kind of image resizing for data augmentation is useless?

If I understood it correctly, you are enlarging the image to have bigger objects, right?
However, you have to keep in mind at any time if the size is bigger than the input size it will be resized to the input and you may lose the augmentation effect.

Related

VGG19 .h5 file modfiying

I'm using pretrained VGG19 in my modified neural transfer code (Gatys algorithm), but my PC doesn't allow me to use input image in original size (original height is 2499 pix, but with 20GB RAM I can use it only 1000 pix maximum)
As I read, the solution for me will be decreasing batch_size. So, my question is - how can I modify VGG19 .h5 file to change batch_size inside it? Or maybe I can override batch_size of it in my code?
Assuming the pretrained model is defined on ImageNet, the maximum input data size for a single sample is 224*224.
If you try and pass a large input, it's possible your deep learning framework will reshape it into many images to be classified at once.
Resizing your input data to 224*224, you will run with a single image (batch size of 1).
You could make a custom implementation of your model to take larger input sizes. However sizing down to 224*224 generally gets good results, depending on the task.

Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2

I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO. This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked).
Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works. So my question is, why is it written that the required input is 300x300 if it works with other sizes too? Do I need to resize all the dataset to 300x300 before I label them? I know it does convolution on the input so i don't think the size really matters (fix me if im wrong). As I know, the convolution occoure until we reach the end of the input.
Thanks for helping!
If I understand correctly you are using TF Object Detection API.
A given model, as mobilenet-v2-ssd, contains 3 main blocks:
[prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]
When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.
if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training.
BTW:
in the config file of the training (https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config)
you can see the resize that was defined:
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
- the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]

Tensorflow object detection -- Increasing batch size leads to failure

I have been trying to train an object detection model using the tensorflow object detection API.
The network trains well when batch_size is 1. However, increasing the batch_size leads to the following error after some steps.
Network : Faster RCNN
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 25000
learning_rate: .00002
}
schedule {
step: 50000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
Error:
INFO:tensorflow:Error reported to Coordinator: , ConcatOp : Dimensions of inputs should match: shape[0] = [1,841,600,3] vs. shape[3] = [1,776,600,3]
[[node concat (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/legacy/trainer.py:190) ]]
Errors may have originated from an input operation.
Input Source operations connected to node concat:
Preprocessor_3/sub (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py:100)
The training with increased batch_size works on SSD mobilenet however.
While, I have solved the issue for my use-case at the moment, posting this question in SO to understand the reason for this behavior.
The reason you get an error is because you cannot technically train Faster RCNN in batch mode on a single GPU. This is due to its two stage architecture. SSD is single stage and hence can be parallelized to give larger batch sizes. If you still want to train F-RCNN with batch size>1, you can do so with multiple GPUs. There is a --num_clones parameter that you need to set to the number of GPUs available to you. set the num_clones and the batchsize to save values (It should be equal to the number of GPUs you have available)
I have used batchsizes of 4,8 and 16 in my application.
--num_clones=2 --ps_tasks=1
Check this link for more details
https://github.com/tensorflow/models/issues/1744
Just from the error it seems like your individual inputs have different sizes. I suppose it tries to concatenate (ConcatOp) 4 single inputs into one tensor to build a mini batch as the input.
While trying to concatenate it has one input with 841x600x3 and one input with 776x600x3 (ignored the batch dimension). So obviously 841 and 776 are not equal but they should be. With a batch size of 1 the concat function is probably not called, since you don't need to concatenate inputs to get a minibatch. There also seems to be no other component that relies on a pre defined input size, so the network will train normally or at least doesn't crash.
I would check the dataset you are using and check if this is supposed to be this way or you have some faulty data samples. If the dataset is ok and this can in fact happen you need to resize all inputs to some kind of pre defined resolution to be able to combine them probably into a minibatch.
You don't need to resize every image in your dataset. Tensorflow can handle it if you specify in your config file.
Default frcnn and ssd config is:
## frcnn
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
## ssd
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
If you change image resizer of frcnn as fixed_shape_resizer like in ssd, You can increase the batch size.
I implemented it and training went well. Unfortunately, my loss didn't decrease as I expected. Then, I switched back to batch size 4 with 4 workers (it means batch size 1 for each worker). Latter is better for my case, but maybe it can be different for your case.
When increasing the batch size, the images loaded in the Tensors should all be of the same size.
This is how you may get the images to be all of the same size:
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 896
max_dimension: 896
pad_to_max_dimension: true
}
}
Padding the images to the maximum dimension, making that "true", that will cause the images to be all of the same size. This enables you to have a batch size larger than one.

Unable to detect custom object using tensorflow object detection (despite training loss < 1.0)

Unable to detect custom object after training using tensorflow object detection. There are No bounding boxes drawn at all. The image is exactly same as input. Even for the training example image its not able to recognize anything
I am trying to detect a specific object like car by doing following
190 training samples
40 test samples
Number of classes is 1
My training loss begins around 14.0 and ends with 0.7 after 1000 iterations. I am using ssd_mobilenet_v1_coco and only additionally changing the config file, as follows as my images are all "around" this size, apart from running for 1000 iterations.
image_resizer {
fixed_shape_resizer {
height: 450
width: 300
}
}
Please help how to debug this. I dont know where exactly to start.
I have annotated (drawn bounding box) images twice without any help. The images are of sizes varying from 425x300 to 450x300.

YOLOv2 input image size

I want to train YOLO on custom objects for detection gender from surv camera stream.
I see that default YOLO input layer is 416x416, should I stick to this or maybe it could be better have bigger size for input images for ex. 640x480 etc.
(Original image size could be from 2 to 4 MPx)