XGBoost: why does test error increase when train error decreases in XGBoost? - xgboost

When I train the model by xgboost and I find "eval-merror" is increasing and "train-merror" is decreasing as below;is something in error?
enter image description here

You are likely overfitting. Have you tried setting early_stopping_rounds? This will terminate the training once xgboost detects that the validation error is increasing.
If this behavior occurs from the very first training step on, you might want to try a smaller learning rate (called eta).
You can find more information on the just mentioned parameters in the api reference: http://xgboost.readthedocs.io/en/latest/python/python_api.html

Related

YOLOv4 loss too high

I am using YOLOv4-tiny for a custom dataset of 26 classes that I collected from Open Images Dataset. The dataset is almost balanced(850 images per class but different number of bounding boxes). When I used YOLOv4-tiny to train on just 3 classes the loss was near 0.5, it was fairly accurate. But for 26 classes as soon as the loss goes below 2 the model starts to overfit. The prediction are also very inaccurate.
I have tried to change the parameters like the learning rate, the momentum and the size but whatever I do the models becomes worse then before. Using regular YOLOv4 model rather then YOLO-tiny does not help either. How can I bring the loss further down?
Have you tried training with mAP? You can take a subset of your training set and make it the validation set. This can be done in the same way you made your training and test set. Then, you can run darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map. This will keep track of the loss in your validation set. When the error in the validation say goes up, this is the time to stop training and prevent overfitting (this is called: early stopping).
You need to run the training for (classes*2000)iterations. However, for the best scores, you need to train your model for at least 6000 iterations (also known as max_batches). Also please remember if you are using a b&w image, change the channels=3 to channels=1. You can stop your training once the avg loss becomes something like this: 0.XXXX.
Here's my mAP graph for 6000 iterations that ran for 6.2 hours:
avg loss with 6000 max_batches.
Moreover, you can follow this FAQ documentation here by Stéphane Charette.

Validation loss curve is flat and training loss curve is higher than validation error curve

I'm building a LSTM model for prediction senario. My dataset has around 248000 piece of data and I use 24000 (around 10%) as validation set, others are training set. My model learning curve is the following:
learning curve
The validation error is always 0.00002 from scratch, and the training error decreased to 0.013533 at epoch 20.
I've read this carefully: https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
Is my validation set is unrepresentative? Is the solution to use larger validation set?
It might be that, first, your underlying concept is very simple which leads to extremely low validation error early on. Second, your data augmentation makes it harder to learn, which yields higher training error.
Yet, I would still run a couple of experiments in your case. First: divide data as 10/90 instead of 90/10 and see how does your validation error changes then - hopefully, you would see some sort of a curve between (now shorter and harder) epochs. Second, I would run validation before training (or after an epoch of 1 batch) to produce a random result.

Avoiding overfitting while training a neural network with Tensorflow

I am training a neural network using Tensorflow's object detetction API to detect cars. I used the following youtube video to learn and execute the process.
https://www.youtube.com/watch?v=srPndLNMMpk&t=65s
Part 1 to 6 of his series.
Now in his video, he has mentioned to stop the training when the loss value reaches ~1 or below on an average and that it would take about 10000'ish' steps.
In my case, it is 7500 steps right now and the loss values keep fluctuating from 0.6 to 1.3.
Alot of people complained in the comment section about false positives on this series but I think this happened because of the unnecessary prolonged process of training (because they din't know maybe when to stop ?) which caused overfitting!
I would like to avoid this problem. I would like to have not the most optimum weights but fairly optimum weights while avoiding false detection or overfitting. I am also observing 'Total Loss' section of Tensorboard. It fluctuates between 0.8 to 1.2. When do I stop the training process?
I would also like to know in general, which factors does the 'stopping of training' depend on? is it always about the average loss of 1 or less?
Additional information:
My training data has ~300 images
Test data ~ 20 images
Since I am using the concept of transfer learning, I chose ssd_mobilenet_v1.model.
Tensorflow version 1.9 (on CPU)
Python version 3.6
Thank you!
You should use a validation test, different from the training set and the test set.
At each epoch, you compute the loss of both training and validation set.
If the validation loss begin to increase, stop your training. You can now test your model on your test set.
The Validation set size is usually the same as the test one. For example, training set is 70% and both validation and test set are 15% each.
Also, please note that 300 images in your dataset seems not enough. You should increase it.
For your other question :
The loss is the sum of your errors, and thus, depends on the problem, and your data. A loss of 1 does not mean much in this regard. Never rely on it to stop your training.

Expected accuracy of the pet example using SSD model in TensorFlow object detection API?

I'm using default pipeline configuration (ssd_inception_v2_pets.config) and pretrained inception v2 COCO model. In TensorBoard, the loss continues decreasing, but the average precision isn't getting any better. Has anyone done similar experiment using inception v2 for SSD? What's your experience?
The reason of the low mAP is due to extreme low score threshold at non maximum suppression step. The result of such low threshold is that almost every image would produces more than 70 detections, while there's only one ground truth in each image. Changing this threshold to a more reasonable value -- 0.1 -- produces much better mAP plot.
You may have been bitten by this issue: https://github.com/tensorflow/models/issues/2749
as I was.
Try updating & grabbing one of the new starting pretrained files, and seeing if your problem is resolved.

Tensorflow: When to stop training due to the best (minimum) cost?

I am using TensorFlow to classify images using LeNet network. I use AdamOptimizer to minimize the cost function. When I start to train the model, I can observe that the training accuracy and validation accuracy and also the cost is changing, sometimes reducing and sometimes increasing.
My questions: When should we stop the training? How can we know that the optimizer will find the minimum cost? how many iterations should we do the training? Can we set a variable or condition to stop at the minimum cost?
My solution is to define a global variable (min_cost) and in each iteration check if the cost is reducing then save the session and replace the min_cost with the new cost. At the end, I will have the saved session for the minimum cost,
Is this a correct approach?
Thanks in advance,
While training neural networks, mostly a target error is defined alongside with a maximum amount of iterations to train. So for example, a target error could be 0.001MSE. Once this error has been reached, the training will stop - if this error has not been reached after the maximum amount of iterations, the training will also stop.
But it seems like you want to train until you know the network can't do any better. Saving the 'best' parameters like you're doing is a fine approach, but do realise that once some kind of minimum cost has been reached, the error won't fluctuate that much anymore. It won't be like the error suddenly goes up significantly, so it is not completely necessary to save the network.
There is no such thing as 'minimal cost' - the network is always trying to go to some local minima, and it will always be doing so. There is not really way you (or an algorithm) can figure out that there is no better error to be reached anymore.
tl;dr - just set a target reasonable target error alongside with a maximum amount of iterations.