Relation between mean average precision(mAP) and Validation loss - tensorflow

I am training two models for Covid-19 detection using chest x-ray. SSD mobilenetV2 320x320 and SSD mobilenetV2 640x640 from Tensorflow object detection API. The training loss for model SSD mobilenetV2 320x320 is 0.22 and validation loss is 0.36. Similarly, the training loss for model SSD mobilenetV2 640x640 is 0.25 and validation loss is 0.29. I am confused why SSD mobilenetV2 320x320 is overfitting?
My second question is that why precision and recall of model SSD mobilenetV2 320x320 is better than SSD mobilenetV2 640x640. Although SSD mobilenetV2 320x320 is overfitting.
Precision and Recall of SSD mobilenetV2 320x320
Precision and Recall of SSD mobilenetV2 640x640
I changed score_threshold value from 9.99999993922529e-09 to 0.2 in pipeline config file. As a result, the mAP value for SSD mobilenetV2 320x320 increase from 0.77 to 0.81 but at the same time the validation loss also increased from 0.36 to 0.42. Please somebody explain the reason to me if you can. I am really confused about this behavior of my model.
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
}
}

Related

Weird results for different models of Tensorflow object detection API

I am training 3 different models for covid-19 detection SSD MobileNetV2 FPNLite 320x320, SSD MobileNetV2 FPNLite 640x640, and SSD MobileNetV2 320x320. I have trained each model for 4000 steps, and I have got good but some weird results.
Models
Batch-size
Training Loss
Validation loss
mAP
SSD MobileNetV2 FPNLite 320x320
16
0.22
0.27
0.85
SSD MobileNetV2 FPNLite 640x640
4
0.19
0.25
0.78
SSD MobileNetV2 320x320
16
0.17
0.34
0.83
My second model has gotten less loss value but I am confused why it has gotten fewer mAP values than the other two models. The learning rate for all three models are the same as in the pipeline config
I was expecting high mAP from SSD MobileNetv2 FPNLite 640x640

Fluctuating training loss but stable validation loss

I am training a binary classification model using SIIM-ISIC Melanoma Classification datasets.
I am using efficientnet V2M as base model
I used cosine decay schedule with 2 warm up epochs and Adam as optimizer
However, my training loss is fluctuating while my validation loss is stable.
Is there a particular reason why this would happen?
Thank in advance

SSD Resnet 50 FPN Loss function clarification

I am using tensorflow object detection api on my dataset. I am using ssd-resnet50-fpn model. While training, I see that classification loss and localization loss has converged but the total loss is still decreasing. Also total loss is not coming out to be the sum of classification loss and localization los. Any ideas on why this is happening. I am using train.py in object_detection/legacy/ folder to train on my dataset. Attached image for the same.
Total loss is the sum of classification loss, localization loss and L2 loss applied to trainable variables, and weightened by "weight_decay"

Which loss function will converge well in multi-label image classification task?

I've trained a multi-label multi-class image classifier by using sigmoid as output activation function and binary_crossentropy as loss function.
The accuracy curve for validation is showing up-down fluctuation while loss curve at few epochs is showing weird(very high) values.
Following is the Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout and BatchNormalization.
Accuracy curve
loss curve
Accuracy and loss-curve for fine-tuned(last block) VGG19 model with Dropout, BatchNormalization and Data Augmentation.
accuracy curve with data augmentation
loss curve with data augmentation
I've trained the classifier with 1800 training images(5-labels) with 100 validation images. The optimizer I'd used is SGD((lr=0.001, momentum=0.99).
Can anyone explain why loss-curve is getting so much weird or high values at some eochs?
Should I use different loss-function? If yes, which one?
Don't worry - all is well. Your loss curve doesn't say much, especially 'spikes in the loss curve'. They're totally allowed, your model is still training. You should look at your accuracy curve, and that one goes up pretty normal I think.

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.