I have trained a face recognition model with tensorflow (4301 classes). The training process goes like follows(I have grab the chart of the training process):
training accuracy
training loss
The training accuracy steadily increases, However, for the training loss, it firstly decreases, then after a certain number of iterations, it weirdly increases.
I simply use softmax loss with weights regularizer. And I use AdamOptimizer to minimize the loss. For learning rate setting, the initial lr is set to 0.0001, the learning rate would decrease by half by every 7 epocs(380000 training images total, batch size is 16). And I have test on a validation set (consist 8300 face images),and get a validation accuracy about 55.0% which is far below the training accuracy.
Is it overfitting ? can overfitting leads to a final increase for the training loss?
Overfitting is when you start having a divergence in the performance on training and test data — this is not the case here since you are reporting training performance only.
Training is running a minimization algorithm on your loss. When your loss starts increasing, it means that training fails at what it is supposed to be doing. You probably want to change your minimization settings to get your training loss to eventually converge.
As for why your accuracy continues to increase long after your loss starts diverging, it is hard to tell without knowing more. An explanation could be that your loss is a sum of different terms, for example a cross-entropy term and a regularization term, and that only the later diverges.
Related
So I have been training EfficientNet for a classification task. I used EfficientNet-B2 model with a batch size of 64 and a learning rate 0.0001.
I was able to get good accuracy and loss while gradually increasing batch size and decreasing the learning rate. But when I just used lr 0.0001 and let the model run, I found the accuracy dropping significantly after 26th epoch while the loss curve was following the usual curve.
I have found a good model but just wanted to know what might be the reason for the accuracy behaving like that in the graph.
I'm training a CNN-LSTM concat model and after 20 epochs I got an accuracy of 69% and a loss of 0.04?
I know that the combination of very high training accuracy and relatively low validation accuracy indicates overfitting but I was wondering if a low accuracy and a very low loss would also indicate overfitting.
Overall, the accuracy increased linearly and the loss decreased exponentially.
Some more details regarding task, loss and so on would be appreciated.
For the training set the loss should have a inverse relationship with the metric e.g. accuracy. If the loss goes down the accuracy should go up. If this is not the case you might want to have a look at the loss function and metric.
So a very low loss and very low accuracy on the same dataset would indicate for me, that something is wrong with the model I want to train and that the model is not learning to solve the task I want it to solve.
I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)
At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy.
When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me.
Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)
I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients.
Is my model overfitting?
If so, how can I reduce the overfitting besides data augmentation?
If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?
The training loss at each epoch is usually computed on the entire training set.
The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey.
Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations.
It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.
My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
update:
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.
I have run deep learning models(CNN's) using tensorflow. Many times during the epoch, i have observed that both loss and accuracy have increased, or both have decreased. My understanding was that both are always inversely related. What could be scenario where both increase or decrease simultaneously.
The loss decreases as the training process goes on, except for some fluctuation introduced by the mini-batch gradient descent and/or regularization techniques like dropout (that introduces random noise).
If the loss decreases, the training process is going well.
The (validation I suppose) accuracy, instead, it's a measure of how good the predictions of your model are.
If the model is learning, the accuracy increases. If the model is overfitting, instead, the accuracy stops to increase and can even start to decrease.
If the loss decreases and the accuracy decreases, your model is overfitting.
If the loss increases and the accuracy increase too is because your regularization techniques are working well and you're fighting the overfitting problem. This is true only if the loss, then, starts to decrease whilst the accuracy continues to increase.
Otherwise, if the loss keep growing your model is diverging and you should look for the cause (usually you're using a too high learning rate value).
I think the top-rated answer is incorrect.
I will assume you are talking about cross-entropy loss, which can be thought of as a measure of 'surprise'.
Loss and accuracy increasing/decreasing simultaneously on the training data tells you nothing about whether your model is overfitting. This can only be determined by comparing loss/accuracy on the validation vs. training data.
If loss and accuracy are both decreasing, it means your model is becoming more confident on its correct predictions, or less confident on its incorrect predictions, or both, hence decreased loss. However, it is also making more incorrect predictions overall, hence the drop in accuracy. Vice versa if both are increasing. That is all we can say.
I'd like to add a possible option here for all those who struggle with a model training right now.
If your validation data is a bit dirty, you might experience that in the beginning of the training the validation loss is low as well as the accuracy, and the more you train your network, the accuracy increases with the loss side by side. The reason why it happens, because it finds the possible outliers of your dirty data and gets a super high loss there. Therefore, your accuracy will grow as it guesses more data right, but the loss grows with it.
This is just what I think based on the math behind the loss and the accuracy,
Note :-
I expect your data is categorical
Your models output :-
[0.1,0.9,0.9009,0.8] (used to calculate loss)
Maxed output :-
[0,0,1,0] (used to calculate acc )
Expected output :-
[0,1,0,0]
Lets clarify what loss and acc calculates :
Loss :- The overall error of y and ypred
Acc :- Just if y and maxed(ypred) is equal
So in a overall our model almost nailed it , resulting in a low loss
But in maxed output no overall is seen its just that they should completely match ,
If they completely match :-
1
else:
0
Thus resulting in a low accuracy too
Try to check mae of the model
remove regularization
check if your are using correct loss
You should check your class index (both train and valid) in training process. It might be sorted in different ways. I have this problem in colab.