Can anyone tell how to set exponential learning_rate instead of constant learning rate in the config file?
Constant learning rate in config file:
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.003
schedule {
step: 6000
learning_rate: .0003
}
schedule {
step: 12000
learning_rate: .00003
}
Refer
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/optimizer.proto
Example:
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
Related
I am training the SSD efficientnet B0 on Pascal Voc 2012 dataset from scratch (It needs to be from scratch) with tensorflow object detection API, i have been training for about more than 60k step with batch size of 5 (More than that and i have OOM error), however the loss fluctuates alot and really struggle to drop even abit as can seen in training loss graph, has my training gone wrong or is it the optimizer that i set is not suitable? How would i solve it?
The optimizer configurations is as below:
optimizer {
adam_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: .01
schedule {
step: 20000
learning_rate: .001
}
schedule {
step: 40000
learning_rate: .0001
}
schedule {
step: 60000
learning_rate: .00001
}
}
}
Please help, thanks.
I'm training my custom model that detects only person.
I used tensorflow object detection API and followed this github document.
I got images from coco dataset.
400 test images and 1600 train images are prepared.
Here is my train config.
train_config: {
batch_size: 6
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 250
decay_factor: 0.9
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
And environment.
Tensorflow : 1.13.1 gpu
GPU : GTX 1070
CDUA : 10.0
cuDNN : 7.4.2
According to above github document, the loss should be under 2.
But my model's loss always converges to 4.
Is there any problems??
I can't figure out what is wrong...
Thanks for all help.
I am training tensorflow object detection API model on the custom dataset i.e. License plate dataset. My goal is to deploy this model to the edge device using tensorflow lite so I can't use any RCNN family model. Because, I can't convert any RCNN family object detection model to tensorflow lite model (this is the limitation from tensorflow object detection API). I am using ssd_mobilenet_v2_coco model to train the custom dataset. Following is the code snippet of my config file:
model {
ssd {
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v2'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
}
}
localization_loss {
weighted_smooth_l1 {
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 3
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 24
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "/home/sach/DL/Pycharm_Workspace/TF1.14/License_Plate_F-RCNN/dataset/experiments/training_SSD/ssd_mobilenet_v2_coco_2018_03_29/model.ckpt"
fine_tune_checkpoint_type: "detection"
num_steps: 150000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/home/sach/DL/Pycharm_Workspace/TF1.14/License_Plate_F-RCNN/dataset/records/training.record"
}
label_map_path: "/home/sach/DL/Pycharm_Workspace/TF1.14/License_Plate_F-RCNN/dataset/records/classes.pbtxt"
}
eval_config: {
num_examples: 488
num_visualizations : 488
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/sach/DL/Pycharm_Workspace/TF1.14/License_Plate_F-RCNN/dataset/records/testing.record"
}
label_map_path: "/home/sach/DL/Pycharm_Workspace/TF1.14/License_Plate_F-RCNN/dataset/records/classes.pbtxt"
shuffle: false
num_readers: 1
}
I have total 1932 images (train images: 1444 and val images: 448). I have trained the model for 150000 steps. Following is the output from tensorboard:
DetectionBoxes Precision mAP#0.5 IOU: After 150K steps, the object detection model accuracy (mAP#0.5 IOU) is ~0.97 i.e. 97%. Which seems to be fine at the moment.
Training Loss: After 150K steps, the training loss is ~1.3. This seems to be okay.
Evaluation/Validation Loss: After 150K steps, the evaluation/validation loss is ~3.90 which is pretty high. However, there is huge difference between training and evaluation loss. Is there any overfitting exist? How can I overcome this problem? In my point of view, training and evaluation loss should be close to each other.
How can I reduce validation/evaluation loss?
I am using the default config file so by default use_dropout: false. Should I change it to use_dropout: true in case overfitting exist?
What should be the acceptable range of training and validation loss for object detection model?
Please share your views. Thanking you!
There are several reasons for overfitting problem In Neural networks, by looking at your config file, I would like to suggest a few things to try to avoid overfitting.
use_dropout: true so that it makes the Neurons less sensitive to minor changes in the weights.
Try increasing iou_threshold in batch_non_max_suppression.
Use l1 regularizer or combination of l1 and l2 regularizer.
Change the optimizer to Nadam or Adam Optimizers.
Include more Augmentation techniques.
You can also use Early Stopping to track your accuracy.
Alternatively, you can observe the Tensorboard visualization, take the weights before the step where the validation loss starts increasing.
I hope trying these steps will resolve the overfitting issue of your model.
I am working on training the object detector with a custom dataset designed to detect the head of a plant. I am using the "Faster R-CNN with Resnet-101 (v1)" that was originally designed for the pet dataset.
I modified the config file to match my dataset (1875 training/375 eval) of images that 275x550 in size. I converted all record files. And the pipeline file is shown below.
I trained on a gpu overnight for 100k steps and the actual evaluation results look really good. It detects all the plant heads and the data is really useful.
The issue is the actual metrics. When checking the tensorboard logs for the eval, all the metrics increase until 30k steps and then drop again making a nice hump in the middle. This goes for the loss, mAP, and precision results.
Why is this result happening? I assumed that if you keep training, the metrics should just flatten out to a line and not just decrease downwards again.
mAP Evaluation: https://imgur.com/a/hjobr6c
Loss Evaluation: https://imgur.com/a/EY8Afqc
# Faster R-CNN with Resnet-101 (v1) originally for Oxford-IIIT Pets Dataset. Modified for wheat head detection
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 1
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 275
max_dimension: 550
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true
load_all_detection_checkpoint_vars: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "object_detection/data_wheat/train.record-?????-of-00010"
}
label_map_path: "object_detection/data_wheat/wheat_label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
num_examples: 375
}
eval_input_reader: {
tf_record_input_reader {
input_path: "object_detection/data_wheat/val.record-?????-of-00010"
}
label_map_path: "object_detection/data_wheat/wheat_label_map.pbtxt"
shuffle: false
num_readers: 1
}
This is a standard case of overfitting: your model is memorizing the training data and lost its ability to generalize on unseen data.
For cases like this one you have two options:
early stopping: monitor the validation metrics and as soon as the metrics become constants and/or starts decreasing stop the training
add regularization to the model (and also do early stopping anyway)
I am fine tuning SSD Mobilenet (COCO) on Pascal VOC dataset. I have around 17K images in the training set and num_steps is 100000. Details of config are -
train_config: {
batch_size: 1
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
However the training loss fluctuates a lot as shown here training loss
How can I avoid this ?
thanks
It seems likely that your learning rate, though you're decaying it, is
still too large in later steps.
Things that I recommend at that point would be to:
Increase your decay
Try another optimizer (e.g. ADAM which worked good for me in such cases)