Training loss fluctuates and does not go down effectively - tensorflow

I am training the SSD efficientnet B0 on Pascal Voc 2012 dataset from scratch (It needs to be from scratch) with tensorflow object detection API, i have been training for about more than 60k step with batch size of 5 (More than that and i have OOM error), however the loss fluctuates alot and really struggle to drop even abit as can seen in training loss graph, has my training gone wrong or is it the optimizer that i set is not suitable? How would i solve it?
The optimizer configurations is as below:
optimizer {
adam_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: .01
schedule {
step: 20000
learning_rate: .001
}
schedule {
step: 40000
learning_rate: .0001
}
schedule {
step: 60000
learning_rate: .00001
}
}
}
Please help, thanks.

Related

Which losses scalar shoud I watch in tensorboard?

I'm training my custom object detection model that only detects person.
I followed this github document.
Here is my environment.
Tensorflow version : 1.13.1 GPU
Tensorboard version : 1.13.1
Pre-trained model : SSD mobilenet v2 quantized 300x300 coco
According to above document, my model's loss should drop down under 2.
So I opened tensorboard and found classification losses, total losses, clone losses scalars.
Here is my tensorboard snapshot.
Which losses scalars should I look up?
I also have problem with converging.
My model's loss converged at 4, not 2.
How can I fix it?
I reduced the learning rate 0.004(provided value) to 0.0000095.
But it still converged at 4.
Here is my train_config.
train_config: {
batch_size: 6
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0000095
decay_steps: 500
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
Is there anything to change?
Thanks for all help.

My custom model's loss converged to a high value(Tensorflow)

I'm training my custom model that detects only person.
I used tensorflow object detection API and followed this github document.
I got images from coco dataset.
400 test images and 1600 train images are prepared.
Here is my train config.
train_config: {
batch_size: 6
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 250
decay_factor: 0.9
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
And environment.
Tensorflow : 1.13.1 gpu
GPU : GTX 1070
CDUA : 10.0
cuDNN : 7.4.2
According to above github document, the loss should be under 2.
But my model's loss always converges to 4.
Is there any problems??
I can't figure out what is wrong...
Thanks for all help.

how to use exponential learning rate in tensorflow object detection?

Can anyone tell how to set exponential learning_rate instead of constant learning rate in the config file?
Constant learning rate in config file:
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.003
schedule {
step: 6000
learning_rate: .0003
}
schedule {
step: 12000
learning_rate: .00003
}
Refer
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/optimizer.proto
Example:
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.004
decay_steps: 800720
decay_factor: 0.95
}
}

Imbalanced dataset Object Detection

Is there a good way to fine tune a model for object detection (in particular, I am trying to use the Tensorflow Object Detection API) for a dataset with highly skewed data? I am trying to use take some categories of COCO and combine it with my own custom data, but there are only about 50 images of my data.
I have tried with just combining the coco data and my own data but it just predicts the coco categories everytime.
You could try using Focal Loss.
See: https://arxiv.org/pdf/1708.02002.pdf
In the Tensorflow Object Detection model file this would appear as follows:
loss {
localization_loss {
weighted_smooth_l1 {
}
}
classification_loss {
weighted_sigmoid_focal {
gamma: 2.0
alpha: 0.25
}
}
classification_weight: 1.0
localization_weight: 1.0
}

Training loss fluctuates a lot

I am fine tuning SSD Mobilenet (COCO) on Pascal VOC dataset. I have around 17K images in the training set and num_steps is 100000. Details of config are -
train_config: {
batch_size: 1
optimizer {
rms_prop_optimizer: {
learning_rate: {
exponential_decay_learning_rate {
initial_learning_rate: 0.0001
decay_steps: 800720
decay_factor: 0.95
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
However the training loss fluctuates a lot as shown here training loss
How can I avoid this ?
thanks
It seems likely that your learning rate, though you're decaying it, is
still too large in later steps.
Things that I recommend at that point would be to:
Increase your decay
Try another optimizer (e.g. ADAM which worked good for me in such cases)