I am fine tuning SSD Mobilenet (COCO) on Pascal VOC dataset. I have around 17K images in the training set and num_steps is 100000. Details of config are -
train_config: {
batch_size: 1
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
However the training loss fluctuates a lot as shown here training loss
How can I avoid this ?
thanks
It seems likely that your learning rate, though you're decaying it, is
still too large in later steps.
Things that I recommend at that point would be to:
Increase your decay
Try another optimizer (e.g. ADAM which worked good for me in such cases)
Related
I am training the SSD efficientnet B0 on Pascal Voc 2012 dataset from scratch (It needs to be from scratch) with tensorflow object detection API, i have been training for about more than 60k step with batch size of 5 (More than that and i have OOM error), however the loss fluctuates alot and really struggle to drop even abit as can seen in training loss graph, has my training gone wrong or is it the optimizer that i set is not suitable? How would i solve it?
The optimizer configurations is as below:
optimizer {
adam_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: .01
schedule {
step: 20000
learning_rate: .001
}
schedule {
step: 40000
learning_rate: .0001
}
schedule {
step: 60000
learning_rate: .00001
}
}
}
Please help, thanks.
I'm training my custom object detection model that only detects person.
I followed this github document.
Here is my environment.
Tensorflow version : 1.13.1 GPU
Tensorboard version : 1.13.1
Pre-trained model : SSD mobilenet v2 quantized 300x300 coco
According to above document, my model's loss should drop down under 2.
So I opened tensorboard and found classification losses, total losses, clone losses scalars.
Here is my tensorboard snapshot.
Which losses scalars should I look up?
I also have problem with converging.
My model's loss converged at 4, not 2.
How can I fix it?
I reduced the learning rate 0.004(provided value) to 0.0000095.
But it still converged at 4.
Here is my train_config.
train_config: {
batch_size: 6
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0000095
decay_steps: 500
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
Is there anything to change?
Thanks for all help.
I'm training my custom model that detects only person.
I used tensorflow object detection API and followed this github document.
I got images from coco dataset.
400 test images and 1600 train images are prepared.
Here is my train config.
train_config: {
batch_size: 6
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 250
decay_factor: 0.9
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
And environment.
Tensorflow : 1.13.1 gpu
GPU : GTX 1070
CDUA : 10.0
cuDNN : 7.4.2
According to above github document, the loss should be under 2.
But my model's loss always converges to 4.
Is there any problems??
I can't figure out what is wrong...
Thanks for all help.
I'm trying to fine-tune the SSD MobileNet v2 (from the model zoo) with my own data. In pipeline.config I see that use_dropout is set to false. Why is that? I thought dropout should be used to prevent overfitting.
box_predictor {
convolutional_box_predictor {
conv_hyperparams {
regularizer {
l2_regularizer {
weight: 3.99999989895e-05
}
}
initializer {
truncated_normal_initializer {
mean: 0.0
stddev: 0.0299999993294
}
}
activation: RELU_6
batch_norm {
decay: 0.999700009823
center: true
scale: true
epsilon: 0.0010000000475
train: true
}
}
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.800000011921
kernel_size: 3
box_code_size: 4
apply_sigmoid_to_scores: false
}
}
Is it because of batch normalization? In this paper, it says that:
3.4 Batch Normalization regularizes the
model
When training with Batch Normalization, a training example is seen in conjunction with other examples in the
mini-batch, and the training network no longer producing deterministic values for a given training example. In
our experiments, we found this effect to be advantageous
to the generalization of the network. Whereas Dropout
(Srivastava et al., 2014) is typically used to reduce overfitting, in a batch-normalized network we found that it can
be either removed or reduced in strength.
Can anyone tell how to set exponential learning_rate instead of constant learning rate in the config file?
Constant learning rate in config file:
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.003
schedule {
step: 6000
learning_rate: .0003
}
schedule {
step: 12000
learning_rate: .00003
}
Refer
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/optimizer.proto
Example:
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}