Accuracy Dropped suddenly after certain epoch (Classification using EfficientNet) - tensorflow

So I have been training EfficientNet for a classification task. I used EfficientNet-B2 model with a batch size of 64 and a learning rate 0.0001.
I was able to get good accuracy and loss while gradually increasing batch size and decreasing the learning rate. But when I just used lr 0.0001 and let the model run, I found the accuracy dropping significantly after 26th epoch while the loss curve was following the usual curve.
I have found a good model but just wanted to know what might be the reason for the accuracy behaving like that in the graph.

Related

What to do when accuracy increasing but loss is also increasing on validation data?

I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.

Validation loss oscillates a lot, validation accuracy > learning accuracy, but test accuracy is high. Is my model overfitting?

I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)
At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy.
When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me.
Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)
I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients.
Is my model overfitting?
If so, how can I reduce the overfitting besides data augmentation?
If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?
The training loss at each epoch is usually computed on the entire training set.
The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey.
Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations.
It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

the training accuracy steadily increase, but training loss decrease and then increase

I have trained a face recognition model with tensorflow (4301 classes). The training process goes like follows(I have grab the chart of the training process):
training accuracy
training loss
The training accuracy steadily increases, However, for the training loss, it firstly decreases, then after a certain number of iterations, it weirdly increases.
I simply use softmax loss with weights regularizer. And I use AdamOptimizer to minimize the loss. For learning rate setting, the initial lr is set to 0.0001, the learning rate would decrease by half by every 7 epocs(380000 training images total, batch size is 16). And I have test on a validation set (consist 8300 face images),and get a validation accuracy about 55.0% which is far below the training accuracy.
Is it overfitting ? can overfitting leads to a final increase for the training loss?
Overfitting is when you start having a divergence in the performance on training and test data — this is not the case here since you are reporting training performance only.
Training is running a minimization algorithm on your loss. When your loss starts increasing, it means that training fails at what it is supposed to be doing. You probably want to change your minimization settings to get your training loss to eventually converge.
As for why your accuracy continues to increase long after your loss starts diverging, it is hard to tell without knowing more. An explanation could be that your loss is a sum of different terms, for example a cross-entropy term and a regularization term, and that only the later diverges.

How to interpret the strange training curve for RNN?

I use the tensorflow to train a simple two-layer RNN on my data set. The training curve is shown as follows:
where, the x-axis is the steps(in one step, a batch_size number of samples is used to update the net parameters), the y-axis is the accuracy. The red, green, blue line is the accuracy in training set, validation set, and the test set, respectively. It seems the training curve is not smooth and have some corrupt change. Is it reasonable?
Have you tried gradient clipping, Adam optimizer and learning rate decay?
From my experience, gradient clipping can prevent exploding gradients, Adam optimizer can converge faster, and learning rate decay can improve generalization.
Have you shuffled the training data?
In addition, visualizing the distribution of weights also helps debugging the model.
It's absolutely OK since you are using SGD. General trend is that your accuracy increases as number of used minibatches increases, however, some minibatches could significantly 'differ' from most of the others, therefore accuracy could be poor on them.
The fact that your test and validation accuracy drops horribly at times 13 and 21 is suspicious. E.g. 13 drops the test score below epoch 1.
This implies your learning rate is probably too large: a single mini-batch shouldn't cause that amount of weight change.