Possible to get or resume loss and mAP graph for YOLO/darknet during training? - yolo

If I stop and resume training in the middle, the chart will only show the data from the iteration I am resuming from and onward. Is there a way to get the full loss chart or, when resuming, also plot the loss/mAP data before killing the training session? I need the chart for reporting/documenting purposes.

Related

How to overwrite checkpoint of same global step for Tensorboard?

If you resume training from a specific epoch (same global step) and run tensorboard, tensorboard (add_scalar) will end up plotting 2 points for that particular global step.
Example, I want trying to test whether changing the learning rate halfway through training will improve/worsen the accuracy:
Plot example
The plots for same time step are plotted twice (I resumed from +15 epochs behind the latest epoch).
Search up the web, cannot find any commands that can ask Tensorboard to just overwrite the previous saved checkpoint for the new one. My expectation is that Tensorboard would know to overwrite the same global step point but it is plotting it together.
You can delete the log file running the script (before creating model).
And then it will be created empty.
Or you can add time in the name of the log:
import time
NAME = 'my_cnn-{}'.format(int(time.time()))
tensorboard = TensorBoard(log_dir='logs/{}'.format(NAME))

YOLOv4 loss too high

I am using YOLOv4-tiny for a custom dataset of 26 classes that I collected from Open Images Dataset. The dataset is almost balanced(850 images per class but different number of bounding boxes). When I used YOLOv4-tiny to train on just 3 classes the loss was near 0.5, it was fairly accurate. But for 26 classes as soon as the loss goes below 2 the model starts to overfit. The prediction are also very inaccurate.
I have tried to change the parameters like the learning rate, the momentum and the size but whatever I do the models becomes worse then before. Using regular YOLOv4 model rather then YOLO-tiny does not help either. How can I bring the loss further down?
Have you tried training with mAP? You can take a subset of your training set and make it the validation set. This can be done in the same way you made your training and test set. Then, you can run darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map. This will keep track of the loss in your validation set. When the error in the validation say goes up, this is the time to stop training and prevent overfitting (this is called: early stopping).
You need to run the training for (classes*2000)iterations. However, for the best scores, you need to train your model for at least 6000 iterations (also known as max_batches). Also please remember if you are using a b&w image, change the channels=3 to channels=1. You can stop your training once the avg loss becomes something like this: 0.XXXX.
Here's my mAP graph for 6000 iterations that ran for 6.2 hours:
avg loss with 6000 max_batches.
Moreover, you can follow this FAQ documentation here by Stéphane Charette.

How to control how frequent data is being recorded in Tensorboard?

So, I created a TensoBoard callback, but, I'm training for 1000's of epochs, and when I view TensorBoard, it is too sluggish because of the enormity of the data to be loaded and plotted, basically millions of datapoints, that's because it is writing everything happening at a batch level. I want only results at the end of epoch, not batch. Can I get to control that?
Additionally, it is by default recording: loss, validation loss, and plenty other things that I didn't ask for. How can I control what is being recorded?
Try update_freq=10000.
This will make it update every 10000 sample.
Or maybe update_freq='epoch', this will update after the epoch end.
https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L997

time-series prediction for price forecasting (problems with predictions)

I am working on a project for price movement forecasting and I am stuck with poor quality predictions.
At every time-step I am using an LSTM to predict the next 10 time-steps. The input is the sequence of the last 45-60 observations. I tested several different ideas, but they all seems to give similar results. The model is trained to minimize MSE.
For each idea I tried a model predicting 1 step at a time where each prediction is fed back as an input for the next prediction, and a model directly predicting the next 10 steps(multiple outputs). For each idea I also tried using as input just the moving average of the previous prices, and extending the input to input the order book at those time-steps.
Each time-step corresponds to a second.
These are the results so far:
1- The first attempt was using as input the moving average of the last N steps, and predict the moving average of the next 10.
At time t, I use the ground truth value of the price and use the model to predict t+1....t+10
This is the result
Predicting moving average
On closer inspection we can see what's going wrong:
Prediction seems to be a flat line. Does not care much about the input data.
2) The second attempt was trying to predict differences, instead of simply the price movement. The input this time instead of simply being X[t] (where X is my input matrix) would be X[t]-X[t-1].
This did not really help.
The plot this time looks like this:
Predicting differences
But on close inspection, when plotting the differences, the predictions are always basically 0.
Plot of differences
At this point, I am stuck here and running our of ideas to try. I was hoping someone with more experience in this type of data could point me in the right direction.
Am I using the right objective to train the model? Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw? (They do incur in low error, but they become meaningless at that point).
At least just a hint on where to dig for further info would be highly appreciated.
Thanks!
Am I using the right objective to train the model?
Yes, but LSTM are always very tricky for forecasting time series. And are very prone to overfitting compared to other time series models.
Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw?
I haven't seen your code, or the details of the LSTM you are using. Make sure you are using a very small network, and you are avoiding overfitting. Make sure that after you differenced the data - you then reintegrate it before evaluating the final forecast.
On trick to try to build a model that forecasts 10 steps ahead directly instead of building a one-step ahead model and then forecasting recursively.

Why and when do I need to use the global step in tensorflow

I am using tensorflow, but I am not sure why I even need the global_step variable or if it is even necessary for training. I have sth like this:
gradients_and_vars = optimizer.compute_gradients(value)
train_op = optimizer.apply_gradients(gradients_and_vars)
and then in my loop inside a session I do this:
_ = sess.run([train_op])
I am using a Queue to feed my data the the graph. Do I even have to instantiate a global_step variable?
My loop looks like this:
while not coord.should_stop():
So this loop stops, when it should stop. So why do I need the global_step at all?
You don't need the global step in all cases. But sometimes people want to stop training, tweak some code and then continue training with the saved and restored model. Then often it is nice to know how long (=for how many time steps) this model had been trained so far. Thus the global step.
Also sometimes your learning rate regime might depend on the time the model already had been trained. Say you want to decay your learning rate every 100.000 steps. If you don't keep track of the number of steps already taken this might be difficult if you interrupted training in between and didn't keep track of the number of steps already taken.
And furthermore if you are using tensorboard the global step is the central parameter for your x-axis of the charts.