YOLO-Darknet Average loss is not decreasing - yolo

I am trying to train my Custom Object in Yolo Darknet with class ->2 ,filter(21),collected around 5000 images of size (1106x620), set Learning rate .01
batch is 64, ans sub division is 16.
Observation-
after 500 - 3000 of iteration , i can see average loss is in between 7.2 to 7.4
Please let me know how should i improve my average loss ?

Loss might be stuck in between but It might not be accuracy criteria if you are doing object detection.
try to calculate Mean Average Precision(MAp) using the command (if you are using original YoLo repository)
./darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights

Try decreasing the learning rate to 0.001 or 0.0001. Also, check the pretrained weights you are using, if there are suitable for your custom use case.

Try decreasing learning rate and batch size. Check if labels class, x, y, width, height are correct. If this does not work there might be a problem about your labels and/or data.

Related

YOLOv4 loss too high

I am using YOLOv4-tiny for a custom dataset of 26 classes that I collected from Open Images Dataset. The dataset is almost balanced(850 images per class but different number of bounding boxes). When I used YOLOv4-tiny to train on just 3 classes the loss was near 0.5, it was fairly accurate. But for 26 classes as soon as the loss goes below 2 the model starts to overfit. The prediction are also very inaccurate.
I have tried to change the parameters like the learning rate, the momentum and the size but whatever I do the models becomes worse then before. Using regular YOLOv4 model rather then YOLO-tiny does not help either. How can I bring the loss further down?
Have you tried training with mAP? You can take a subset of your training set and make it the validation set. This can be done in the same way you made your training and test set. Then, you can run darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map. This will keep track of the loss in your validation set. When the error in the validation say goes up, this is the time to stop training and prevent overfitting (this is called: early stopping).
You need to run the training for (classes*2000)iterations. However, for the best scores, you need to train your model for at least 6000 iterations (also known as max_batches). Also please remember if you are using a b&w image, change the channels=3 to channels=1. You can stop your training once the avg loss becomes something like this: 0.XXXX.
Here's my mAP graph for 6000 iterations that ran for 6.2 hours:
avg loss with 6000 max_batches.
Moreover, you can follow this FAQ documentation here by Stéphane Charette.

Why does Tensorflow LearningRateScheduler *increase* (not decrease) the learning rate with each epoch?

On the Coursera class "TensorFlow in Practice -- Sequeneces, Time Series and Prediction" the 9th video of the second week uses a callback to dynamically increase (not decrease) the learning rate. I understand why we need to dynamically adjust the rate; but this callback is increasing the learning rate with each epoch. Don't we want to do the opposite and gradually decrease the learning rate as the neural net learns more? I'm sure the video is correct (it was created by Andrew Ng and Google, who obviously know a lot about TensorFlow) but why are we increasing (instead of decreasing) the learning rate? Is keras actually using the inverse of this number as the learning rate, or something like that?
#Doesn't the next line *increase* the learning rate with each callback?
#But shouldn't we be gradually decreasing it?
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
lambda epoch: 1e-8 * 10**(epoch / 20))
optimizer = tf.keras.optimizers.SGD(lr=1e-8, momentum=0.9)
model.compile(loss="mse", optimizer=optimizer)
history = model.fit(dataset, epochs=100, callbacks=[lr_schedule], verbose=0)
And here's a full code example from the sample notebook that they provide with this example:
https://colab.research.google.com/github/lmoroney/dlaicourse/blob/master/TensorFlow%20In%20Practice/Course%204%20-%20S%2BP/S%2BP%20Week%202%20Lesson%203.ipynb
Is it correct to increase the learning rate with each epoch? Won't that result in the optimizer "over-shooting" the answer on each epoch and never converging to a solution?
You are right. It does not make any sense to actually do this when the goal is to train a network. Might they be doing this to demonstrate that your learning rate can be too high? the graph just after it might be showing such a lesson.

Resnet training - L2 loss decreases while cross-entropy stays around 0.69

I am using this https://github.com/tensorflow/models/tree/master/official/resnet official tensorflow implementation of resnet to train a binary classifier on my own dataset. I modified a little bit of the input_fn in imagenet_main.py to do my own image loading and preprocessing. But after many times of parameter tuning, I can't make my model train properly. I can only find a set of parameters that let training accuracy increase reaching 100%, while the validation accuracy stay around 50% forever. The implementation uses piece-wise learning-rate. I tried initial learning rate from 0.1 to 1e-5 and weight decay from 1e-2 to 1e-5, and no convergence on validation set was found.
A suspicious observation is that during training, the l2 loss decrease slowly and steady while cross-entropy is very reluctant to decrease, staying around 0.69.
Any idea about what can I try further ?
Regarding my dataset and image preprocessing, The training data set is around 100K images. The validation set is around 10K. I just resize each image to 224*224 while keeping aspect ration and subtract 127 on each channel and divide them by 255.
Actually #Hua resnet have so many trainable parameters and it is trained on image net which has 1k classes. and your data-set has only two classes. Dense layers of resnet has 4k neurons which in result increase the number of trainable parameter. Now number of parameters are directly related to risk of over-fitting. Means that resnet model is not suitable for your data kindly make some changes to resnet. Try to decrease number of parameter. That may help –

when to stop training object detection tensorflow

I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.

Does Stochastic Gradient Descent even work with TensorFlow?

I designed a MLP, fully connected, with 2 hidden and one output layer.
I get a nice learning curve if I use batch or mini-batch gradient descent.
But a straight line while performing Stochastic Gradient Descent (violet)
What did I get wrong?
In my understanding, I do stochastic gradient descent with Tensorflow, if I provide just one train/learn example each train step, like:
X = tf.placeholder("float", [None, amountInput],name="Input")
Y = tf.placeholder("float", [None, amountOutput],name="TeachingInput")
...
m, i = sess.run([merged, train_op], feed_dict={X:[input],Y:[label]})
Whereby input is a 10-component vector and label is a 20-component vector.
For testings I run 1000 iterations, each iterations contains one of 50 prepared train/learn example.
I expected an overfittet nn. But as you see, it doesn't learn :(
Because the nn will perform in an online-learning environment, a mini-batch oder batch gradient descent isn't an option.
thanks for any hints.
The batch size influences the effective learning rate.
If you think to the update formula of a single parameter, you'll see that it's updated averaging the various values computed for this parameter, for every element in the input batch.
This means that if you're working with a batch size with size n, your "real" learning rate per single parameter is about learning_rate/n.
Thus, if the model you've trained with batches of size n have trained without issues, this is because the learning rate was ok for that batch size.
If you use pure stochastic gradient descent, you have to lower the learning rate (usually by a factor of some power of 10).
So, for example, if your learning rate was 1e-4 with a batch size of 128, try with a learning rate of 1e-4 / 128.0 as see if the network learn (it should).