SSD Resnet 50 FPN Loss function clarification - tensorflow

I am using tensorflow object detection api on my dataset. I am using ssd-resnet50-fpn model. While training, I see that classification loss and localization loss has converged but the total loss is still decreasing. Also total loss is not coming out to be the sum of classification loss and localization los. Any ideas on why this is happening. I am using in object_detection/legacy/ folder to train on my dataset. Attached image for the same.

Total loss is the sum of classification loss, localization loss and L2 loss applied to trainable variables, and weightened by "weight_decay"


Fluctuating training loss but stable validation loss

I am training a binary classification model using SIIM-ISIC Melanoma Classification datasets.
I am using efficientnet V2M as base model
I used cosine decay schedule with 2 warm up epochs and Adam as optimizer
However, my training loss is fluctuating while my validation loss is stable.
Is there a particular reason why this would happen?
Thank in advance

Different results for Categorical crossentropy as loss and as accuracy metric in keras model training

I am training and optimizing my multi classification CNN with the following compile method of keras.
metrics=['accuracy', 'categorical_crossentropy'])
I used categorical_crossentropy as loss as well as metric to watch. After training the model for 10 epochs, I get the following values.
Evn though I have chosen categorical_crossentropy as loss and a metric, what can be the possible reasons for their values to be different?

Which loss function will converge well in multi-label image classification task?

I've trained a multi-label multi-class image classifier by using sigmoid as output activation function and binary_crossentropy as loss function.
The accuracy curve for validation is showing up-down fluctuation while loss curve at few epochs is showing weird(very high) values.
Following is the Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout and BatchNormalization.
Accuracy curve
loss curve
Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout, BatchNormalization and Data Augmentation.
accuracy curve with data augmentation
loss curve with data augmentation
I've trained the classifier with 1800 training images(5-labels) with 100 validation images. The optimizer I'd used is SGD((lr=0.001, momentum=0.99).
Can anyone explain why loss-curve is getting so much weird or high values at some eochs?
Should I use different loss-function? If yes, which one?
Don't worry - all is well. Your loss curve doesn't say much, especially 'spikes in the loss curve'. They're totally allowed, your model is still training. You should look at your accuracy curve, and that one goes up pretty normal I think.

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history =, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet ( This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.