Tensorflow object detection -- Increasing batch size leads to failure - tensorflow

I have been trying to train an object detection model using the tensorflow object detection API.
The network trains well when batch_size is 1. However, increasing the batch_size leads to the following error after some steps.
Network : Faster RCNN
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 25000
learning_rate: .00002
}
schedule {
step: 50000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
Error:
INFO:tensorflow:Error reported to Coordinator: , ConcatOp : Dimensions of inputs should match: shape[0] = [1,841,600,3] vs. shape[3] = [1,776,600,3]
[[node concat (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/legacy/trainer.py:190) ]]
Errors may have originated from an input operation.
Input Source operations connected to node concat:
Preprocessor_3/sub (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py:100)
The training with increased batch_size works on SSD mobilenet however.
While, I have solved the issue for my use-case at the moment, posting this question in SO to understand the reason for this behavior.

The reason you get an error is because you cannot technically train Faster RCNN in batch mode on a single GPU. This is due to its two stage architecture. SSD is single stage and hence can be parallelized to give larger batch sizes. If you still want to train F-RCNN with batch size>1, you can do so with multiple GPUs. There is a --num_clones parameter that you need to set to the number of GPUs available to you. set the num_clones and the batchsize to save values (It should be equal to the number of GPUs you have available)
I have used batchsizes of 4,8 and 16 in my application.
--num_clones=2 --ps_tasks=1
Check this link for more details
https://github.com/tensorflow/models/issues/1744

Just from the error it seems like your individual inputs have different sizes. I suppose it tries to concatenate (ConcatOp) 4 single inputs into one tensor to build a mini batch as the input.
While trying to concatenate it has one input with 841x600x3 and one input with 776x600x3 (ignored the batch dimension). So obviously 841 and 776 are not equal but they should be. With a batch size of 1 the concat function is probably not called, since you don't need to concatenate inputs to get a minibatch. There also seems to be no other component that relies on a pre defined input size, so the network will train normally or at least doesn't crash.
I would check the dataset you are using and check if this is supposed to be this way or you have some faulty data samples. If the dataset is ok and this can in fact happen you need to resize all inputs to some kind of pre defined resolution to be able to combine them probably into a minibatch.

You don't need to resize every image in your dataset. Tensorflow can handle it if you specify in your config file.
Default frcnn and ssd config is:
## frcnn
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
## ssd
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
If you change image resizer of frcnn as fixed_shape_resizer like in ssd, You can increase the batch size.
I implemented it and training went well. Unfortunately, my loss didn't decrease as I expected. Then, I switched back to batch size 4 with 4 workers (it means batch size 1 for each worker). Latter is better for my case, but maybe it can be different for your case.

When increasing the batch size, the images loaded in the Tensors should all be of the same size.
This is how you may get the images to be all of the same size:
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 896
max_dimension: 896
pad_to_max_dimension: true
}
}
Padding the images to the maximum dimension, making that "true", that will cause the images to be all of the same size. This enables you to have a batch size larger than one.

Related

Object detection Classfication /A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights)

I am try to classfication with object detection at the colab.I am using "ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config"When I start to training I get error.
Training=
!python model_main_tf2.py \
--pipeline_config_path=training/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config \
--model_dir=training \
--alsologtostderr
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W1130 13:39:27.991891 140559633127296 util.py:158] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
I was dealing with the same error. I assume that the training stopped when you got the error you cited above. If so, you might want to check your folder paths.
I was able to get rid of the error myself when I figured out that I was trying to create a new model but TF was looking to a 'model_dir' folder that contained checkpoints from my previous model. Because my num_steps was not greater than the num_steps used in the previous model, TF effectively stopped running the training because the num_steps had already been completed.
By changing the model_dir to a brand new folder, I was able to overcome this error and begin training a new model. Hopefully this works for you as well.
If anyone is trying to continue their training, the solution as #GbG mentioned is to update your num_steps value in the pipeline.config:
Original:
num_steps: 25000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .04
total_steps: 25000
Updated:
num_steps: 50000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .04
total_steps: 50000
It means you trained your model enough num_steps in your config file

Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2

I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO. This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked).
Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works. So my question is, why is it written that the required input is 300x300 if it works with other sizes too? Do I need to resize all the dataset to 300x300 before I label them? I know it does convolution on the input so i don't think the size really matters (fix me if im wrong). As I know, the convolution occoure until we reach the end of the input.
Thanks for helping!
If I understand correctly you are using TF Object Detection API.
A given model, as mobilenet-v2-ssd, contains 3 main blocks:
[prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]
When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.
if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training.
BTW:
in the config file of the training (https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config)
you can see the resize that was defined:
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
- the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]

Does one step in Object Detection API mean processing one picture or one bounding box?

In pipeline.config file in Tensorflow Object Detection API we have parameter NUM_STEPS.
Does one step mean processing one whole picture, or one bounding box?
In the config file, we have:
model {
faster_rcnn {
# (...)
}
train_config: {
batch_size: 1
optimizer {
# (...)
}
gradient_clipping_by_norm: 10.0
# (...)
num_steps: 200000 # <-- HERE IT IS
# (...)
}
}
E.g. We've got a training TFRecord with 2 pictures, 10 bboxes each. If I have NUM_STEPS set to 10, does this mean, that I would process first 10 bboxes, or each photo 5 times?
Full config file can be found here:
https://github.com/tensorflow/models/blob/32dadfc2def4f05faeedacce98e4c4099be4c433/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config#L113
One 'step' corresponds to one batch processing.
The input of faster-RCNN is a full image and your batch size is 1, thus it means that you are using one image each time. In your case, the first step will process the five box of the first image and the second step the five of the second one.

in depth understating of Object detection data augmentation

I am trying to use the data augmentation features of the object detection API. I have configured the augmentation option in the config file. I am using below git hub repository for it.
https://github.com/tensorflow/models/tree/master/research/object_detection
I have used below configuration in config file.
data_augmentation_options {
random_pixel_value_scale {
minval: 0.6
}
}
in trainer.py file below method is getting called for augmentation.
tensor_dict = preprocessor.preprocess(
tensor_dict, data_augmentation_options,
func_arg_map=preprocessor.get_default_func_arg_map(
include_instance_masks=include_instance_masks,
include_keypoints=include_keypoints))
My question is number of images are getting increased after the preprocessing. If yes how to validate that ? tensor_dict length is 12 before and after the preprocessing.
The number of images per step that your model is going to see is defined in the config file by the parameter batch_size
train_config: {
batch_size: N
. . .
None of the data_augmentation options will incriease/decrease the images per step (tensor_dict) your network is receiving. The data_augmentation options will just randomly alter the original batch.
One could interpret that the data_augmentation options are kind of augmenting the effective size of your dataset because for the same image, the network would receive each time a slighly different version of the image. (depending on the actual data_augmentation parameters you are using).

Tensorflow object detection api validation data size

I am running tutorial from object detection API and I am using Oxford dataset with ResNet Faster-RCNN.
When I evaluate my trained model by running (eval.py), Tensorboard returns about 0.95 smoothed precision value.
My question is how many image set does it evaluate? Because from Tensorboard and their tutorial link (https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_pets.md), Tensorboard only shows 10 images.
Does it mean that they check precision only with 10 images?
My Oxford dataset validation jpg counts should be about 2,200.
In my configuration, I specified input path correctly like this:
eval_input_reader: {
tf_record_input_reader {
input_path: "my_path/pet_val.record"
}
label_map_path: "my_path/pet_label_map.pbtxt"
shuffle: false
num_readers: 1
}
And does eval.py prints mAP at the end?
I run eval.py about three days ago on my 1 GPU local machine, but it does not print anything.
Finally, does this API provide F-value and fps (frame per second)? Anyone has experience with this?
edit: it seems that we can setup eval size limit from configuration such as /object_detection/samples/configs/faster_rcnn_resnet101_pets.config#L131. When I print len(result_lists) from https://github.com/tensorflow/models/blob/master/object_detection/eval_util.py#L404, it prints 2000, which was my eval num_examples.
I was also able to generate fps by comparing timestamp manually.
By default, we only visualize 10 images on Tensorboard (to avoid overwhelming it with images) but this is configurable from the eval_config. You can also change the number of images evaluated (defaults to 5000) in the config too.