I am using the TF2 research object detection API with an EfficientDet D3 model for my training. The optimizer is defined in my pipeline.config file like this:
optimizer {
adam_optimizer {
learning_rate {
cosine_decay_learning_rate {
learning_rate_base: 0.08
total_steps: 300000
warmup_learning_rate: 0.001
warmup_steps: 250
}
}
}
use_moving_average: false
}
So I would assume the learning rate would go up to 0.08 until step 250, afterwards it would slowly go down again until end of training at step 30,000 - is that assumption correct?
However, the learning rate chart in Tensorboard looks like this:
So the learning rate is sticking to 0.08 after step 250 has been reached.
I tried to let that run for hours - the learning rate won't go down at all.
What I am missing here?
PS: The whole pipeline.config file can be found here.
Related
I am trying to train a ssd_mobilenet_v2_keras for object detection on a dataset of more or less 6000 images. The problem is that images are rotated randomly during training (or at least, this is what it looks like from the tensorboard). This is the configuration I am using in the pipeline.config file:
train_config {
batch_size: 32
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_rgb_to_gray {
probability: 0.25
}
}
data_augmentation_options {
random_jpeg_quality {
random_coef: 0.8
min_jpeg_quality: 50
max_jpeg_quality: 100
}
}
sync_replicas: true
optimizer {
adam_optimizer: {
epsilon: 1e-7
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 1e-3
total_steps: 50000
warmup_learning_rate: 2.5e-4
warmup_steps: 5000
}
}
}
use_moving_average: false
}
fine_tune_checkpoint: "pre-trained-models/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0"
num_steps: 50000
startup_delay_steps: 0.0
replicas_to_aggregate: 8
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
fine_tune_checkpoint_version: V2
}
I have also tried to remove the random horizontal flip (I knew that was probably not solve anything, I just gave it a try...) but nothing changes, I still see some training images rotated in the tensorboard, and also if I run the evaluation sometimes the images are rotated. Of course the xml with the bounding box coordinates is not "rotated" so the ground truth image in tensorboard appear completely wrong, the object is in a position and the ground truth box is in a completely different position (the right position if the image wasn't rotated...)
I have trained (fine-tuned) successfully and validated object detection model from Tensorflow Model Zoo 2, with this config:
...
train_input_reader: {
label_map_path: "/label_map.pbtxt"
tf_record_input_reader {
input_path: "/train.record"
}
}
eval_config: {
metrics_set: "coco_detection_metrics" #coco_detection_metrics
use_moving_averages: false
batch_size: 1;
}
eval_input_reader: {
label_map_path: "/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/validation.record"
}
}
...
Then I noticed by analyzing the performance on Tensorboard that the best model based on eval loss is at step 13k i.e. ckpt-14.
However, I also have /test.record on which I want to test the model based on ckpt.14. What could I do? I tried to create a separate folder with ckpt-14.index e ckpt-14.data-... and the file named "checkpoint" containing only ckpt-14 and its timestamp and then launched the evaluation process by replacing validation.record with test.record. in tf_record_input_reader.
It's correct? is there a proper way to testing a model based on a checkpoint with tensorflow 2 object detection api?
You can train and test on the same model simultaneously......But if you have a single GPU, and training with a large dataset, it may not be possible to run testing with the same GPU, as it would result in memory errors.....One good way is to to use the same code and use a work around to do the testing using CPU.......The testing cycle takes place once every 1000 steps and on Tensorboard, you can see both test and eval, and you will also see the bounding boxes with the ground truth side-by-side......
I will try to share the codes for concurrent training and testing.....For training, it will use the GPU, and for testing it will use the CPU.....It has been working for me and no doubt, it should work for you too.....
I was reading the this blog about focal loss. In the section Focal Loss Trick it says:
Facebook AI Research used is to initialize the bias term of the last
layer to some non-zero value such that the pt of positive samples is
small and the pt of negative samples is large. Concretely, they set
the bias term b=−log((1−π)/π). Here π is simply are variable instead
of the ordinary π. In their case, they set π=0.01, therefore b≫wx.
I want to do the same using tensorflow object detection api. Here, the focal loss is given by the following line in config file:
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.25
gamma: 2.0
}
} }
But I don't know how to set the bias term of the last layer to some non-zero value. How to achieve it in tensorflow ?
It's given by class_prediction_bias_init in the box_predictor. So, the config file will look something like this:
box_predictor {
weight_shared_convolutional_box_predictor {
class_prediction_bias_init: -1.99
}
}
I encounter a strange problem while training CNN to detect objects from my own dataset. I am using transfer learning and at the beginning of training, the loss value is decreasing (as expected). But after some time, it gets higher and higher, and I have no idea why it happens.
At the same time, when I look at Images tab on Tensorboard to check how well the CNN predicts objects, I can see that it does it very well, it doesn't look as it is getting worse over time. Also, the Precision and Recall charts look good, only the Loss charts (especially classification_loss) show an increasing trend over time.
Here are some specific details:
I have 10 different classes of logos (such as DHL, BMW, FedEx, etc.)
Around 600 images per class
I use tensorflow-gpu on Ubuntu 18.04
I tried multiple pre-trained models, the latest being faster_rcnn_resnet101_coco with this config pipeline:
model {
faster_rcnn {
num_classes: 10
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/models2/faster_rcnn_resnet101_coco/model.ckpt"
from_detection_checkpoint: true
data_augmentation_options {
random_horizontal_flip {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/train.record"
}
label_map_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/label_map.pbtxt"
}
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/test.record"
}
label_map_path: "/home/franciszek/Pobrane/models-master/research/object_detection/logo_detection/data2/label_map.pbtxt"
shuffle: false
num_readers: 1
}
Here you can see results that I get after training for nearly 23 hours and reaching over 120k steps:
Loss and Total Loss
Precision
So, my question is, why is the loss value increasing over time? It should be getting smaller or stay more or less constant, but you can clearly see the increasing trend in the above charts.
I think everything is properly configured and my dataset is pretty decent (also .tfrecord files were correctly "built").
To check if it is my fault I tried to use somebody's else dataset and configuration files. So I used the racoon dataset author's files (he provided all of the necessary files on his repo). I just downloaded them and started training with no modifications to check if I would get similar results as him.
Surprisingly, after 82k steps, I got entirely different charts than the ones shown in the linked article (that were captured after 22k steps). Here you can see the comparison of our results:
My losses vs his TotalLoss
My precision vs his mAP
Clearly, something worked differently on my PC. I suspect it may be the same reason why I get increasing loss on my own dataset, that's why I mentioned it.
The totalLoss is the weighted sum of those four other losses. (RPN cla and reg losses, BoxCla cla and reg losses) and they are all Evaluation loss. On tensorboard you can check or uncheck to see the evaluation results for training only or for evaluation only. (For example, the following pic has train summary and evaluation summary)
If the evaluation loss is increasing, this might suggest an overfitting model, besides, the precision metrics dropped a little bit.
To try a better fine-tuning result, you may try adjusting the weights of the four losses, for example, you may increase the weight for BoxClassifierLoss/classification_loss to let the model focused on this metric better. In your config file, the loss weight for second_stage_classification_loss_weight and first_stage_objectness_loss_weight are both 1 while the other two are both 2, so the model currently focused on the other two a little more.
An extra question about why loss_1 and loss_2 are the same. This can be explained by looking at the tensorflow graph.
Here loss_2 is the summary for total_loss, (note this total_loss is not the same as in totalLoss) and the red-circled node is a tf.identity node. This node will output the same tensor as the input, so loss_1 is the same as loss_2
I am trying to run TF object detection with mask rcnn, but it keeps dying on a node with 500GB of memory.
I updated the models/research/object_detection/trainer.py ConfigProto to
session_config = tf.ConfigProto(allow_soft_placement=True,
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1,
device_count = {'CPU': 1},
log_device_placement=False)
I updated the mask_rcnn_inception_resnet_v2_atrous_coco.config to
train_config: {
batch_queue_capacity: 500
num_batch_queue_threads: 8
prefetch_queue_capacity: 10
Updating the ConfigProto has had the best effect so far. I got it all the way to 30 steps before it died instead of 1. I'm reducing the values in the train_config by half for this run. I have also reduced the number of images and objects significantly.
Any other ideas?
500GB is a good amount of memory. I have had issues with running out of GPU memory, which is a separate constraint.
For TensorFlow v2, I have found the following useful:
1. Reduce batch_size to a small value
In the config file, set:
train_config: {
batch_size: 4
...
}
batch_size can be as low as 1.
2. Reduce the dimensions of resized images
In the config file, set the resizer height and width to a value lower than the default of 1024x1024.
model {
faster_rcnn {
number_of_stages: 3
num_classes: 1
image_resizer {
fixed_shape_resizer {
height: 256
width: 256
}
}
3. Don't train the Feature Detector
This only applies to Mask R-CNN, and is the most difficult change to implement. In the file research/object_detection/model_lib_v2.py, change the following code:
Current:
def eager_train_step(detection_model,
...
trainable_variables = detection_model.trainable_variables
gradients = tape.gradient(total_loss, trainable_variables)
if clip_gradients_value:
gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
optimizer.apply_gradients(zip(gradients, trainable_variables))
New:
def eager_train_step(detection_model,
...
# Mask R-CNN variables to train -- not feature detector
trainable_variables = detection_model.trainable_variables
to_fine_tune = []
prefixes_to_train = ['FirstStageBoxPredictor',
'mask_rcnn_keras_box_predictor',
'RPNConv'
]
for var in trainable_variables:
if any([var.name.startswith(prefix) for prefix in prefixes_to_train]):
to_fine_tune.append(var)
gradients = tape.gradient(total_loss, to_fine_tune)
if clip_gradients_value:
gradients, _ = tf.clip_by_global_norm(gradients, clip_gradients_value)
optimizer.apply_gradients(zip(gradients, to_fine_tune))
There are implications to each of these changes. However, they may allow for a "good enough" result using scarce resources.
I had a similar issue. I managed to reduce memory consumption by another factor of 2.5x by setting the following values:
prefetch_size: 4
num_readers: 4
min_after_dequeue: 1
I am not sure which of them (maybe all?) are responsible for reducing the memory, (i did not test that) or how much their exact values influence the memory consumption, but you can easily try that out.
Some of the options that previously worked to reduce memory usage have been deprecated. From object_detection/protos/input_reader.proto:
optional uint32 queue_capacity = 3 [default=2000, deprecated=true];
optional uint32 min_after_dequeue = 4 [default=1000, deprecated=true];
optional uint32 prefetch_size = 13 [default = 512, deprecated=true];
optional uint32 num_parallel_map_calls = 14 [default = 64, deprecated=true];
As of today, num_parallel_batches appears to be the larges memory hog.
The *_input_reader messages my config file now looks like this:
train_input_reader: {
tf_record_input_reader {
input_path: "<DATASET_DIR>/tfrecords/train*.tfrecord"
}
label_map_path: "<DATASET_DIR>/label_map.pbtxt"
load_instance_masks: true
mask_type: PNG_MASKS
num_parallel_batches: 1
}
Mask RCNN training now uses ~50% less CPU memory than before (training on 775 x 522 images).