I'm a beginner with Tensorflow 1.4.0 and I'm trying to perform my first training + evaluation process on an object detection model. What I'm seeing is something weird when looking at the output of the evaluation steps.
Here is the steps I made. First, it's worth to say that my goal is to detect two different kind of shapes in very particular scientific images. They are under a kind of "copyright" so I just can show a simplified version of them (made by hand). Just keep in mind that the original ones are way more detailed.
A raw example of input image, see it as a repeated pattern (there is always a grid in the background) with some particular shapes in random positions.
As you can see I want to train the model to detect 2 classes: "round" shapes (class A) and "irregular" shapes (class B).
I used labelImg to generate labels for both the classes in XML format. In general, I've labeled 168 images (960x720 RGB, PNG) ending up with a total of 800 boxes (a single image might have multiple A/B shapes in it).
I've also prepared a smaller dataset for evaluation composed of 10 new images and 150 labels. This time the images are bigger than the others in the train dataset (but they are not "resized", simply the viewport is larger so there could be more events in each input). We are talking about 1920x1440 RGB, PNG images.
Then I converted the XMLs for both the datasets into two .tfrecord files (there are some scripts around GitHub for this).
Then I prepared all the other input files for Tensorflow:
Label map file:
item {
id: 1
name: 'shape_a'
display_name: 'Shape A'
}
item {
id: 2
name: 'shape_b'
display_name: 'Shape B'
}
Config file (adapted from https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs). As you can see I've chosen the faster_rcnn_inception_v2 and I tried to train it from scratch (because of the nature of those images, that are way different from the ones used in the pretrained models). Most of the parameters are kept as they are in the repository.
model {
faster_rcnn {
num_classes: 2
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 720
max_dimension: 960
}
}
feature_extractor {
type: 'faster_rcnn_inception_v2'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.5
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 0
learning_rate: .0002
}
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
from_detection_checkpoint: false
# fine_tune_checkpoint: "./run/train/modelXXXXXX.ckpt"
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {}
}
data_augmentation_options {
random_vertical_flip {}
}
data_augmentation_options {
random_adjust_brightness { max_delta: 0.15 }
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "./train.tfrecord"
}
label_map_path: "./label_map.pbtxt"
}
eval_config: {
num_examples: 10
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
eval_interval_secs: 300
}
eval_input_reader: {
tf_record_input_reader {
input_path: "./eval.tfrecord"
}
label_map_path: "./label_map.pbtxt"
shuffle: false
num_readers: 1
}
Finally, I run Tensorflow by calling the https://github.com/tensorflow/models/blob/master/research/object_detection/train.py script. By running on a notebook Nvidia Quadro GPU, performances are around 0.600 sec/step. There are no errors in the console but the first thing I see is that the Loss seems to converge to 0.4 and stay there in relatively few (?) steps:
When around 500 steps, I've also started the evaluation script (https://github.com/tensorflow/models/blob/master/research/object_detection/eval.py) on the CPU. It runs every 5 minutes (eval_interval_secs: 300) and I can see the output on Tensorboard.
Here is the problem. The first evaluation is relative to the checkpoint at step #0, so the output images are a bunch of randomly displaced boxes, and this should be normal. One fact is that only boxes for the first A class are present.
Then, from the second evaluation (around step #1000) and so on all the output images have no detection anymore! No A/B class boxes are drawn and nothing show up until I decide to stop everything (step #10000).
I was expecting to continue seeing detection, even if with errors.
I have many questions and I've probably made clear mistakes in my flow (my knowledge is still very limited):
Is it really a strange behavior what I'm seeing on loss and evaluation outputs?
What techniques can I use to check if I did some conceptual mistakes in data preparation?
Can I debug what's happening under the hood during training?
How about the Tensorflow config file? Is there something wrong there?
A note: I've also tried that same thing using other models like ssd_*, but behavior is the same.
Related
I'm trying to train an object detection model to detect and classify 10 classes. My original dataset is pretty sparse and unbalanced, containing a total of 3k tagged images with the following distribution between classes:
Class 1: 21
Class 2: 22
Class 3: 9
Class 4: 192
Class 5: 2240
Class 6: 319
Class 7: 56
Class 8: 190
Class 9: 44
Class 10: 167
Because of this sparsity, I have performed augmentation on all images, namely adding noise, blur, contrast, brightness, and horizontal flipping. I also augmented the flipped images further with noise, contrast, and brightness. The resulting dataset consists of 37k tagged images with the following distribution:
Class 1: 4235
Class 2: 5365
Class 3: 2385
Class 4: 10755
Class 5: 17185
Class 6: 4035
Class 7: 3150
Class 8: 3820
Class 9: 555
Class 10: 1500
The image below shows the different losses for 4 different sessions. The pink graph is the result from the augmented dataset of 37k images, and the other graphs are from previous runs on the original dataset of ~2.5k images. As you can see from the pink graph, the total loss is not decreasing at all from its initial value (as is the case for the blue and red graph from previous runs). The RPN loss is decreasing, but the box classifier loss is increasing, what can be the reason for this?
I have also included an image of the average precision for each class. The fact that the precision for most classes increases steadily the whole time while the loss is not decreasing seems to me like the model is overfitting? Is it a bad idea to 10x the dataset by augmenting every like I've done? I've also included the config file I'm using below. Any suggestions as to how to improve my training result is appreciated!
model {
faster_rcnn {
num_classes: 10
image_resizer {
fixed_shape_resizer {
height: 300
width: 500
}
}
feature_extractor {
type: 'faster_rcnn_inception_resnet_v2'
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 0.75, 1.0, 1.5,2,3]
aspect_ratios: [0.5,1,2,3]
height: 32
width: 32
height_stride: 8
width_stride: 8
}
}
first_stage_atrous_rate: 1
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.5
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 17
maxpool_kernel_size: 1
maxpool_stride: 1
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: True
dropout_keep_probability: 0.6
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.3
iou_threshold: 0.5
# soft_nms_sigma: 0.5
# use_class_agnostic_nms: True
# max_classes_per_detection: 1
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
use_multiclass_scores : False
optimizer {
#momentum_optimizer: {
adam_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0001
schedule {
step: 150000
learning_rate: .00001
}
schedule {
step: 250000
learning_rate: .000001
}
}
}
#momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
from_detection_checkpoint: false
data_augmentation_options {
random_horizontal_flip {}
}
data_augmentation_options {
random_crop_image {
min_object_covered : 1.0
min_aspect_ratio: 1
max_aspect_ratio: 1
min_area: 0.5
max_area: 1
random_coef: 0.5
}
}
}
Unfortunately with deep learning, it can often times be difficult to distinguish what exact parameter is causing you an issue. From looking at your question, it seems that even with the data augmentation, which is a great thing to do, is that the number of images that you have per class varies very heavily.
For example after you augment your data, you end up with these to classes with images in them
Class 5: 17185
Class 9: 555
Class 5 has 17,185 images, while class 9 only has 555. There is a huge imbalance in the number of images, and its often times preferred to have as close as possible the same amount of images for each class.
When you are training, you will have a validation section, where a pool of images from all classes will be used to test the model at that instance. If you have a bunch of images in one class and not the other, the model will do fairly better validating the images from the larger class, while struggling with the validation of the image from the smaller class because the model didn't have many examples to train with, and or it started to train heavier with the larger class because there are more training examples in there.
I encounter a strange problem while training CNN to detect objects from my own dataset. I am using transfer learning and at the beginning of training, the loss value is decreasing (as expected). But after some time, it gets higher and higher, and I have no idea why it happens.
At the same time, when I look at Images tab on Tensorboard to check how well the CNN predicts objects, I can see that it does it very well, it doesn't look as it is getting worse over time. Also, the Precision and Recall charts look good, only the Loss charts (especially classification_loss) show an increasing trend over time.
Here are some specific details:
I have 10 different classes of logos (such as DHL, BMW, FedEx, etc.)
Around 600 images per class
I use tensorflow-gpu on Ubuntu 18.04
I tried multiple pre-trained models, the latest being faster_rcnn_resnet101_coco with this config pipeline:
model {
faster_rcnn {
num_classes: 10
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/models2/faster_rcnn_resnet101_coco/model.ckpt"
from_detection_checkpoint: true
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/train.record"
}
label_map_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/test.record"
}
label_map_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/label_map.pbtxt"
shuffle: false
num_readers: 1
}
Here you can see results that I get after training for nearly 23 hours and reaching over 120k steps:
Loss and Total Loss
Precision
So, my question is, why is the loss value increasing over time? It should be getting smaller or stay more or less constant, but you can clearly see the increasing trend in the above charts.
I think everything is properly configured and my dataset is pretty decent (also .tfrecord files were correctly "built").
To check if it is my fault I tried to use somebody's else dataset and configuration files. So I used the racoon dataset author's files (he provided all of the necessary files on his repo). I just downloaded them and started training with no modifications to check if I would get similar results as him.
Surprisingly, after 82k steps, I got entirely different charts than the ones shown in the linked article (that were captured after 22k steps). Here you can see the comparison of our results:
My losses vs his TotalLoss
My precision vs his mAP
Clearly, something worked differently on my PC. I suspect it may be the same reason why I get increasing loss on my own dataset, that's why I mentioned it.
The totalLoss is the weighted sum of those four other losses. (RPN cla and reg losses, BoxCla cla and reg losses) and they are all Evaluation loss. On tensorboard you can check or uncheck to see the evaluation results for training only or for evaluation only. (For example, the following pic has train summary and evaluation summary)
If the evaluation loss is increasing, this might suggest an overfitting model, besides, the precision metrics dropped a little bit.
To try a better fine-tuning result, you may try adjusting the weights of the four losses, for example, you may increase the weight for BoxClassifierLoss/classification_loss to let the model focused on this metric better. In your config file, the loss weight for second_stage_classification_loss_weight and first_stage_objectness_loss_weight are both 1 while the other two are both 2, so the model currently focused on the other two a little more.
An extra question about why loss_1 and loss_2 are the same. This can be explained by looking at the tensorflow graph.
Here loss_2 is the summary for total_loss, (note this total_loss is not the same as in totalLoss) and the red-circled node is a tf.identity node. This node will output the same tensor as the input, so loss_1 is the same as loss_2
I am using Tensorflow to train my data-set (with Object-detection API) locally with 1080 Nvidia 8GB,
I use create_pet_tf_record.py to generate TFRecords files. I don't train from scratch I use mask_rcnn_inception_v2_coco_2018_01_28/model.ckpt as a fine_tune_checkpoint.
When I run python object_detection/train.py and /eval.py, I check the training and evaluation process thru Tensorboard. Initially, everything seems correct like this pic1 with zero step.
The training checkpoint interval takes long time to be saved. After more than 5,000 training steps, the evaluation moved from /model.ckpt-0 to /model.ckpt-3642 and the whole process will be NOT okay at this moment as shown in this pic2.
This is my file mask_rcnn_inception_v2.config
model {
faster_rcnn {
num_classes: 1
image_resizer {
fixed_shape_resizer {
height: 375
width: 500
}
}
number_of_stages: 3
feature_extractor {
type: 'faster_rcnn_inception_v2'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
predict_instance_masks: true
mask_height: 15
mask_width: 15
mask_prediction_conv_depth: 0
mask_prediction_num_conv_layers: 2
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_mask_prediction_loss_weight: 4.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "/home/jesse/gpu-py3/models/research/object_detection/models/model/mask_rcnn_inception_v2_coco_train/mask_rcnn_inception_v2_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/home/jesse/gpu-py3/models/research/ttt/pet_train.record"
}
label_map_path: "/home/jesse/gpu-py3/models/research/object_detection/data/pet_label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/jesse/gpu-py3/models/research/ttt/pet_val.record"
}
label_map_path: "/home/jesse/gpu-py3/models/research/object_detection/data/pet_label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
shuffle: false
num_readers: 1
}
I don't know where I am mistaken, I feel like I should run evaluation more often, and the training checkpoint should be saved every 2000 steps for example. Or I may need to edit the pipeline file mask_rcnn_inception_v2.config. I don't know why the training result is very disappointed after 3642 steps as seen in pic2.
Any help is highly appreciated
My 2 cents on this, assuming you have not modified the important config parameters much, your training data is very diverse and as more iterations go through its generalising. Try with more accurate labelling of images, even if it means fewer images.
I am using Google API for object detection in tensorflow to train and infer on a custom dataset.
I would like to adjust the parameters of the config file to better suit my samples (e.g. no. of region proposals, size of ROI bbox, etc.).
To do so, I need to know what each parameter does.
Unfortunately, the config files (found here ) do not have comments or explanations.
Some, such as "num classes" are self-explanatory, but others are tricky.
I found this file with more comments , but wasn't able to 'translate' it to my format.
I would like to know one of the following:
1. explanation of each parameter for google's API config file
or
2. 'translation' from the official faster-rcnn to google's API config
or at least
3. thorough review of faster-rcnn with technical details of the parameters (the official article doesn't provide all the details)
Thank you for your kind help !
Example of a config file:
# Faster R-CNN with Resnet-101 (v1) configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 90
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
num_epochs: 1
}
I found two sources that shed some light on the config file:
1. The folder protos inside tensorflow github covers all configuration options with some comments on each options. You should checkout faster_rcnn.proto , eval.proto and train.proto for the most common
2. This blog post by Algorithmia covers thoroughly all steps to download, prepare and train faster RCNN on Google's Open Images dataset. 2/3-way through, there is some discussion on the configuration options.
I'm using Tensorflow object detection API on my own data with faster_rcnn_resnet101 model. I'm training from scratch. Training part goes well, but evaluation part stuck from the start and never showed result. It looks like:
I tried using older version of api that I downloaded few months ago, on the same dataset. Everything worked. Is there something wrong with the current version of api, especially on evaluation part? Thank you for attention.
My configuration file looks like this:
model {
faster_rcnn {
num_classes: 10
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
#fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
#from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
#num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/PATH/TO/train.record"
}
label_map_path: "/PATH/TO/my_label_map.pbtxt"
}
eval_config: {
num_examples: 2000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
#max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/PATH/TO/test.record"
}
label_map_path: "/PATH/TO/my_label_map.pbtxt"
shuffle: false
num_readers: 1
num_epochs: 1
}
Faster R-CNN object detector takes a little longer to evaluate (in comparison with YOLO or SSD) due to higher accuracy vs speed tradeoff. I recommend reducing the number of images to 5-10 to see if the evaluation script produces an output. As an additional check you can visualize the detected objects in tensorboard by adding the num_visualizations key to eval config:
eval_config: {
num_examples: 10
num_visualizations: 10
min_score_threshold: 0.15
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 1
}
With the above config you should be able to see images tab in tensorboard with object detections. Notice that I also reduced the IoU threshold to 0.15 to allow detection of less confident boxes.