Fluctuating training loss but stable validation loss - tensorflow

I am training a binary classification model using SIIM-ISIC Melanoma Classification datasets.
I am using efficientnet V2M as base model
I used cosine decay schedule with 2 warm up epochs and Adam as optimizer
However, my training loss is fluctuating while my validation loss is stable.
Is there a particular reason why this would happen?
Thank in advance

Related

Higher train set accuracy, Lower test set accuracy

Im using CNN to classify wireless signal.
Meamwhile I meet some strange problem - when trainset accuray is 80%, I got 79% testset accuracy, but when trianset accuracy is 93%, the testset accuray fall to 71%. Anyone have same problem before?
My net is based on keras + tensorflow.
the detail of net is :
CNN(512,(2,2),tanh)
Batch_normaliztion
flatten()
DNN(512,elu)
DNN(256,elu)
DNN(128,softmax)
opt=adam
loss = mse
THANKS
This would appear to be a case of over fitting.How did you get the training accuracy to go from 80% to 93%? Was it just by running more epochs?.
If over fitting is what is happening add dropout layers to the model. This should improve the validation accuracy but it may take more epochs to achieve the desired training accuracy. Another alternative is to use regularizers in the dense layers.
The more complex your model is the more it is prone to over fitting so you might try running the model with just two dense layers or alternatively reduce the number of nodes in the hidden layers.

SSD Resnet 50 FPN Loss function clarification

I am using tensorflow object detection api on my dataset. I am using ssd-resnet50-fpn model. While training, I see that classification loss and localization loss has converged but the total loss is still decreasing. Also total loss is not coming out to be the sum of classification loss and localization los. Any ideas on why this is happening. I am using train.py in object_detection/legacy/ folder to train on my dataset. Attached image for the same.
Total loss is the sum of classification loss, localization loss and L2 loss applied to trainable variables, and weightened by "weight_decay"

Which loss function will converge well in multi-label image classification task?

I've trained a multi-label multi-class image classifier by using sigmoid as output activation function and binary_crossentropy as loss function.
The accuracy curve for validation is showing up-down fluctuation while loss curve at few epochs is showing weird(very high) values.
Following is the Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout and BatchNormalization.
Accuracy curve
loss curve
Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout, BatchNormalization and Data Augmentation.
accuracy curve with data augmentation
loss curve with data augmentation
I've trained the classifier with 1800 training images(5-labels) with 100 validation images. The optimizer I'd used is SGD((lr=0.001, momentum=0.99).
Can anyone explain why loss-curve is getting so much weird or high values at some eochs?
Should I use different loss-function? If yes, which one?
Don't worry - all is well. Your loss curve doesn't say much, especially 'spikes in the loss curve'. They're totally allowed, your model is still training. You should look at your accuracy curve, and that one goes up pretty normal I think.

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

the training accuracy steadily increase, but training loss decrease and then increase

I have trained a face recognition model with tensorflow (4301 classes). The training process goes like follows(I have grab the chart of the training process):
training accuracy
training loss
The training accuracy steadily increases, However, for the training loss, it firstly decreases, then after a certain number of iterations, it weirdly increases.
I simply use softmax loss with weights regularizer. And I use AdamOptimizer to minimize the loss. For learning rate setting, the initial lr is set to 0.0001, the learning rate would decrease by half by every 7 epocs(380000 training images total, batch size is 16). And I have test on a validation set (consist 8300 face images),and get a validation accuracy about 55.0% which is far below the training accuracy.
Is it overfitting ? can overfitting leads to a final increase for the training loss?
Overfitting is when you start having a divergence in the performance on training and test data — this is not the case here since you are reporting training performance only.
Training is running a minimization algorithm on your loss. When your loss starts increasing, it means that training fails at what it is supposed to be doing. You probably want to change your minimization settings to get your training loss to eventually converge.
As for why your accuracy continues to increase long after your loss starts diverging, it is hard to tell without knowing more. An explanation could be that your loss is a sum of different terms, for example a cross-entropy term and a regularization term, and that only the later diverges.