I'm attempting to train a Faster RCNN inception v2 object detector to locate relatively small objects (16x16 with a few at 100x100 pixels) in a set of 512x512 images. I ran with an default configuration and ended up with low accuracy ~0.4 mAP # 0.5 IoU. I then changed some of the pipeline configuration to try to account for the object size but now I'm getting no detections at all (0.00 mAP) and the RPN localization_loss plateaus at 1.0 (the other loss components are significantly lower ~0.05). I have these questions about my pipeline.config updates:
My assumption is that the network is having problems selecting appropriate region proposals. Is that likely, and if so what configuration changes are likely to help?
Did I make some mistake in the changes I made that wrecked the accuracy?
Configuration Changes Attempted:
first_stage_features_stride: Lowered from 16 to 8
grid_anchor_generator: Set anchor size to 64x64 and lowered width/height stride from 16 to 8
train_config: Increased batch_size from 1 to 16
Complete Pipeline Configuration Below:
model {
faster_rcnn {
num_classes: 8
image_resizer {
fixed_shape_resizer {
height: 512
width: 512
}
}
feature_extractor {
type: "faster_rcnn_inception_v2"
first_stage_features_stride: 8
}
first_stage_anchor_generator {
grid_anchor_generator {
height: 64
width: 64
height_stride: 8
width_stride: 8
scales: 0.25
scales: 0.5
scales: 1.0
scales: 2.0
aspect_ratios: 0.5
aspect_ratios: 1.0
aspect_ratios: 2.0
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.00999999977648
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079
first_stage_max_proposals: 100
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
use_dropout: false
dropout_keep_probability: 1.0
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.300000011921
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 200
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config {
batch_size: 16
data_augmentation_options {
random_horizontal_flip {
}
}
optimizer {
momentum_optimizer {
learning_rate {
manual_step_learning_rate {
initial_learning_rate: 0.000199999994948
schedule {
step: 0
learning_rate: 0.000199999994948
}
schedule {
step: 900000
learning_rate: 1.99999994948e-05
}
schedule {
step: 1200000
learning_rate: 1.99999999495e-06
}
}
}
momentum_optimizer_value: 0.899999976158
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
#fine_tune_checkpoint: “XXXXXXXXX”
from_detection_checkpoint: true
#num_steps: 200000
}
train_input_reader {
label_map_path: “XXXXXXXXXX”
tf_record_input_reader {
input_path: “XXXXXXXXXXXX”
}
}
eval_config {
num_examples: 1500
#max_evals: 10
use_moving_averages: false
}
eval_input_reader {
label_map_path: “XXXXXXXXXXXX”
shuffle: false
num_readers: 1
tf_record_input_reader {
input_path: “XXXXXXXXXXXXXba”
}
}
Related
I'm a rookie to tensorflow and currently working on object detection API.
I've chosen ssd_resnet50_fpn to get started and downloaded the pretrained model from tensorflow model zoo to do transfer learning with my own dataset with only 1 class (person). The training configurations was defined in the pipeline.config which was revised from the one in the same package of the pretrained model, and I trained the model with legacy train.py script.
The training process was fine and the loss decreased as expected, and I've exported my evaluation images with bounded boxes when doing the evaluation (via legacy eval.py). The inference was fine and those exported images worked as expected.
However, I found something weird that I got different evaluation results on the same model checkpoint. When I perform eval.py twice with the same parameters, I found that the bounded boxes from inference were different on the same image.
Here's the evaluation result made by eval.py (coco_detection_metrics)
First time
Second time
Since I'm not really understand about the mAP, but the two results were slightly different.
And here's one of the exported image during the evaluation, the left one is the first evaluation and the right is the second.
Exported Image during evaluation
Seems like the model weight changes during inference,
how can I find out the problem? Is there any configuration that I missed?
I'm using tensorflow 1.10.1 with python 3.5.2 and cloned object detection API from https://github.com/tensorflow/models without change.
Here's my pipeline.config:
model {
ssd {
num_classes: 1
image_resizer {
fixed_shape_resizer {
height: 640
width: 640
}
}
feature_extractor {
type: "ssd_resnet50_v1_fpn"
depth_multiplier: 1.0
min_depth: 16
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.000399999989895
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019
scale: true
epsilon: 0.0010000000475
}
}
override_base_feature_extractor_hyperparams: true
}
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 0.000399999989895
}
}
initializer {
random_normal_initializer {
mean: 0.0
stddev: 0.00999999977648
}
}
activation: RELU_6
batch_norm {
decay: 0.996999979019
scale: true
epsilon: 0.0010000000475
}
}
use_dropout: true
dropout_keep_probability: 0.7
depth: 256
num_layers_before_predictor: 4
kernel_size: 3
class_prediction_bias_init: -4.59999990463
}
}
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
scales_per_octave: 2
}
}
post_processing {
batch_non_max_suppression {
score_threshold: 0.300000011921
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
normalize_loss_by_num_matches: true
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}
encode_background_as_zeros: true
normalize_loc_loss_by_codesize: true
inplace_batchnorm_update: true
freeze_batchnorm: false
}
}
train_config {
batch_size: 8
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
sync_replicas: false
optimizer {
adam_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 5000
decay_factor: 0.9
}
}
}
use_moving_average: false
}
fine_tune_checkpoint: "/tf-object-detection-training/models/ssd_resnet50/saved/model.ckpt-652123"
num_steps: 2000000
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader {
label_map_path: "/tf-object-detection-training/dataset_VOC/label.pbtxt"
tf_record_input_reader {
input_path: "/tf-object-detection-training/dataset_VOC/person_train.record-?????-of-00010"
}
}
eval_config {
num_examples: 10000
num_visualizations: 100
eval_interval_secs: 60
metrics_set: "coco_detection_metrics"
use_moving_averages: false
min_score_threshold: 0.5
retain_original_images: false
keep_image_id_for_visualization_export: true
visualization_export_dir: "/tf-object-detection-training/models/ssd_resnet50/eval_detections/"
}
eval_input_reader {
label_map_path: "/tf-object-detection-training/dataset_VOC/label.pbtxt"
shuffle: false
num_readers: 1
tf_record_input_reader {
input_path: "/tf-object-detection-training/dataset_VOC/person_val.record-?????-of-00010"
}
}
Thanks for any advice
After tracing code for a long time, I've found the answer is that the 'use_dropout' flag set in the pipeline.config.
Seems like the dropout function is not removed while doing inference so the eval.py and frozen_inference_graph were all applied dropout function and making random inference.
To solve this, simply remove 'use_dropout' from pipeline.config fixes this.
I use tensorflow object detection api (https://github.com/tensorflow/models/tree/master/research/object_detection) to train a rfcn model, using voc 2007+2012 trainval datasets, and tested on voc 2007 test. The MAP#0.5 is much lower compared to the caffe version. The caffe version are trained 110000 iterations, and the tensorflow version are trained to 140000 iterations. A pretrained resnet-v1-50 module to initialize the backbone feature extractor. The config file as follows:
#pascal_voc_resnet50_rfcn.config:
model {
faster_rcnn {
num_classes: 20
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet50'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0005
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 1.0
first_stage_objectness_loss_weight: 1.0
second_stage_box_predictor {
rfcn_box_predictor {
conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0005
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
crop_height: 18
crop_width: 18
num_spatial_bins_height: 3
num_spatial_bins_width: 3
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.7
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 1.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.001
schedule {
step: 0
learning_rate: .001
}
schedule {
step: 900000
learning_rate: .0001
}
schedule {
step: 1200000
learning_rate: .00001
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "resnet_v1_50/resnet_v1_50.ckpt"
from_detection_checkpoint: false
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 1500000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "voc_dataset/trainval.tfrecords"
}
label_map_path: "object_detection/data/pascal_label_map.pbtxt"
}
eval_config: {
# num_examples: 8000
num_examples: 4952
num_visualizations: 4952
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 1
visualization_export_dir: 'outputs_eval_imgs'
metrics_set: 'pascal_voc_metrics'
}
eval_input_reader: {
tf_record_input_reader {
input_path: "voc_dataset/test.tfrecords"
}
label_map_path: "object_detection/data/pascal_label_map.pbtxt"
shuffle: false
num_readers: 1
num_epochs: 1
}
final result is:
PascalBoxes_PerformanceByCategory/AP#0.5IOU/aeroplane: 0.701776
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bicycle: 0.742742
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bird: 0.723409
PascalBoxes_PerformanceByCategory/AP#0.5IOU/boat: 0.513328
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bottle: 0.531051
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bus: 0.769170
PascalBoxes_PerformanceByCategory/AP#0.5IOU/car: 0.811411
PascalBoxes_PerformanceByCategory/AP#0.5IOU/cat: 0.831349
PascalBoxes_PerformanceByCategory/AP#0.5IOU/chair: 0.472102
PascalBoxes_PerformanceByCategory/AP#0.5IOU/cow: 0.790175
PascalBoxes_PerformanceByCategory/AP#0.5IOU/diningtable: 0.483809
PascalBoxes_PerformanceByCategory/AP#0.5IOU/dog: 0.819959
PascalBoxes_PerformanceByCategory/AP#0.5IOU/horse: 0.838640
PascalBoxes_PerformanceByCategory/AP#0.5IOU/motorbike: 0.733901
PascalBoxes_PerformanceByCategory/AP#0.5IOU/person: 0.765344
PascalBoxes_PerformanceByCategory/AP#0.5IOU/pottedplant: 0.379224
PascalBoxes_PerformanceByCategory/AP#0.5IOU/sheep: 0.719418
PascalBoxes_PerformanceByCategory/AP#0.5IOU/sofa: 0.576437
PascalBoxes_PerformanceByCategory/AP#0.5IOU/train: 0.726485
PascalBoxes_PerformanceByCategory/AP#0.5IOU/tvmonitor: 0.683094
PascalBoxes_Precision/mAP#0.5IOU: 0.680641
However, when I use the original version(based on caffe), the mAP is 0.746, and the detail is following:
PascalBoxes_PerformanceByCategory/AP#0.5IOU/aeroplane: 0.781
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bicycle:0.793
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bird: 0.756
PascalBoxes_PerformanceByCategory/AP#0.5IOU/boat:0.652
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bottle:0.578
PascalBoxes_PerformanceByCategory/AP#0.5IOU/bus:0.843
PascalBoxes_PerformanceByCategory/AP#0.5IOU/car:0.846
PascalBoxes_PerformanceByCategory/AP#0.5IOU/cat: 0.889
PascalBoxes_PerformanceByCategory/AP#0.5IOU/chair:0.565
PascalBoxes_PerformanceByCategory/AP#0.5IOU/cow:0.835
PascalBoxes_PerformanceByCategory/AP#0.5IOU/diningtable: 0.658
PascalBoxes_PerformanceByCategory/AP#0.5IOU/dog: 0.867
PascalBoxes_PerformanceByCategory/AP#0.5IOU/horse:0.857
PascalBoxes_PerformanceByCategory/AP#0.5IOU/motorbike:0.792
PascalBoxes_PerformanceByCategory/AP#0.5IOU/person:0.778
PascalBoxes_PerformanceByCategory/AP#0.5IOU/pottedplant:0.412
PascalBoxes_PerformanceByCategory/AP#0.5IOU/sheep: 0.757
PascalBoxes_PerformanceByCategory/AP#0.5IOU/sofa:0.723
PascalBoxes_PerformanceByCategory/AP#0.5IOU/train:0.846
PascalBoxes_PerformanceByCategory/AP#0.5IOU/tvmonitor:0.684
PascalBoxes_Precision/mAP#0.5IOU: 0.746
I am using Tensorflow to train my data-set (with Object-detection API) locally with 1080 Nvidia 8GB,
I use create_pet_tf_record.py to generate TFRecords files. I don't train from scratch I use mask_rcnn_inception_v2_coco_2018_01_28/model.ckpt as a fine_tune_checkpoint.
When I run python object_detection/train.py and /eval.py, I check the training and evaluation process thru Tensorboard. Initially, everything seems correct like this pic1 with zero step.
The training checkpoint interval takes long time to be saved. After more than 5,000 training steps, the evaluation moved from /model.ckpt-0 to /model.ckpt-3642 and the whole process will be NOT okay at this moment as shown in this pic2.
This is my file mask_rcnn_inception_v2.config
model {
faster_rcnn {
num_classes: 1
image_resizer {
fixed_shape_resizer {
height: 375
width: 500
}
}
number_of_stages: 3
feature_extractor {
type: 'faster_rcnn_inception_v2'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
predict_instance_masks: true
mask_height: 15
mask_width: 15
mask_prediction_conv_depth: 0
mask_prediction_num_conv_layers: 2
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_mask_prediction_loss_weight: 4.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 900000
learning_rate: .00002
}
schedule {
step: 1200000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "/home/jesse/gpu-py3/models/research/object_detection/models/model/mask_rcnn_inception_v2_coco_train/mask_rcnn_inception_v2_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/home/jesse/gpu-py3/models/research/ttt/pet_train.record"
}
label_map_path: "/home/jesse/gpu-py3/models/research/object_detection/data/pet_label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/jesse/gpu-py3/models/research/ttt/pet_val.record"
}
label_map_path: "/home/jesse/gpu-py3/models/research/object_detection/data/pet_label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
shuffle: false
num_readers: 1
}
I don't know where I am mistaken, I feel like I should run evaluation more often, and the training checkpoint should be saved every 2000 steps for example. Or I may need to edit the pipeline file mask_rcnn_inception_v2.config. I don't know why the training result is very disappointed after 3642 steps as seen in pic2.
Any help is highly appreciated
My 2 cents on this, assuming you have not modified the important config parameters much, your training data is very diverse and as more iterations go through its generalising. Try with more accurate labelling of images, even if it means fewer images.
I am using Google API for object detection in tensorflow to train and infer on a custom dataset.
I would like to adjust the parameters of the config file to better suit my samples (e.g. no. of region proposals, size of ROI bbox, etc.).
To do so, I need to know what each parameter does.
Unfortunately, the config files (found here ) do not have comments or explanations.
Some, such as "num classes" are self-explanatory, but others are tricky.
I found this file with more comments , but wasn't able to 'translate' it to my format.
I would like to know one of the following:
1. explanation of each parameter for google's API config file
or
2. 'translation' from the official faster-rcnn to google's API config
or at least
3. thorough review of faster-rcnn with technical details of the parameters (the official article doesn't provide all the details)
Thank you for your kind help !
Example of a config file:
# Faster R-CNN with Resnet-101 (v1) configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 90
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
num_epochs: 1
}
I found two sources that shed some light on the config file:
1. The folder protos inside tensorflow github covers all configuration options with some comments on each options. You should checkout faster_rcnn.proto , eval.proto and train.proto for the most common
2. This blog post by Algorithmia covers thoroughly all steps to download, prepare and train faster RCNN on Google's Open Images dataset. 2/3-way through, there is some discussion on the configuration options.
I'm using Tensorflow object detection API on my own data with faster_rcnn_resnet101 model. I'm training from scratch. Training part goes well, but evaluation part stuck from the start and never showed result. It looks like:
I tried using older version of api that I downloaded few months ago, on the same dataset. Everything worked. Is there something wrong with the current version of api, especially on evaluation part? Thank you for attention.
My configuration file looks like this:
model {
faster_rcnn {
num_classes: 10
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
#fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
#from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
#num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/PATH/TO/train.record"
}
label_map_path: "/PATH/TO/my_label_map.pbtxt"
}
eval_config: {
num_examples: 2000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
#max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/PATH/TO/test.record"
}
label_map_path: "/PATH/TO/my_label_map.pbtxt"
shuffle: false
num_readers: 1
num_epochs: 1
}
Faster R-CNN object detector takes a little longer to evaluate (in comparison with YOLO or SSD) due to higher accuracy vs speed tradeoff. I recommend reducing the number of images to 5-10 to see if the evaluation script produces an output. As an additional check you can visualize the detected objects in tensorboard by adding the num_visualizations key to eval config:
eval_config: {
num_examples: 10
num_visualizations: 10
min_score_threshold: 0.15
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 1
}
With the above config you should be able to see images tab in tensorboard with object detections. Notice that I also reduced the IoU threshold to 0.15 to allow detection of less confident boxes.