Weird results for different models of Tensorflow object detection API - tensorflow

I am training 3 different models for covid-19 detection SSD MobileNetV2 FPNLite 320x320, SSD MobileNetV2 FPNLite 640x640, and SSD MobileNetV2 320x320. I have trained each model for 4000 steps, and I have got good but some weird results.
Models
Batch-size
Training Loss
Validation loss
mAP
SSD MobileNetV2 FPNLite 320x320
16
0.22
0.27
0.85
SSD MobileNetV2 FPNLite 640x640
4
0.19
0.25
0.78
SSD MobileNetV2 320x320
16
0.17
0.34
0.83
My second model has gotten less loss value but I am confused why it has gotten fewer mAP values than the other two models. The learning rate for all three models are the same as in the pipeline config
I was expecting high mAP from SSD MobileNetv2 FPNLite 640x640

Related

Relation between mean average precision(mAP) and Validation loss

I am training two models for Covid-19 detection using chest x-ray. SSD mobilenetV2 320x320 and SSD mobilenetV2 640x640 from Tensorflow object detection API. The training loss for model SSD mobilenetV2 320x320 is 0.22 and validation loss is 0.36. Similarly, the training loss for model SSD mobilenetV2 640x640 is 0.25 and validation loss is 0.29. I am confused why SSD mobilenetV2 320x320 is overfitting?
My second question is that why precision and recall of model SSD mobilenetV2 320x320 is better than SSD mobilenetV2 640x640. Although SSD mobilenetV2 320x320 is overfitting.
Precision and Recall of SSD mobilenetV2 320x320
Precision and Recall of SSD mobilenetV2 640x640
I changed score_threshold value from 9.99999993922529e-09 to 0.2 in pipeline config file. As a result, the mAP value for SSD mobilenetV2 320x320 increase from 0.77 to 0.81 but at the same time the validation loss also increased from 0.36 to 0.42. Please somebody explain the reason to me if you can. I am really confused about this behavior of my model.
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
}
}

Fluctuating training loss but stable validation loss

I am training a binary classification model using SIIM-ISIC Melanoma Classification datasets.
I am using efficientnet V2M as base model
I used cosine decay schedule with 2 warm up epochs and Adam as optimizer
However, my training loss is fluctuating while my validation loss is stable.
Is there a particular reason why this would happen?
Thank in advance

SSD Resnet 50 FPN Loss function clarification

I am using tensorflow object detection api on my dataset. I am using ssd-resnet50-fpn model. While training, I see that classification loss and localization loss has converged but the total loss is still decreasing. Also total loss is not coming out to be the sum of classification loss and localization los. Any ideas on why this is happening. I am using train.py in object_detection/legacy/ folder to train on my dataset. Attached image for the same.
Total loss is the sum of classification loss, localization loss and L2 loss applied to trainable variables, and weightened by "weight_decay"

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

Why is quantized graph inference takes much more time than using the original graph?

I followed this tutorial in order to quantize my graph into 8 bit.I can't share the exact graph here but i can say it's a simple convolutional neural network.
When i run the benchmark tool over the original and quantized networks it's clear that the quantized network is much much slower (100 ms vs. 4.5 ms).
Slowest nodes in original network :
time average [ms] [%] [cdf%] [Op] [Name]
1.198 26.54% 26.54% MatMul fc10/fc10/MatMul
0.337 7.47% 34.02% Conv2D conv2/Conv2D
0.332 7.36% 41.37% Conv2D conv4/Conv2D
0.323 7.15% 48.53% Conv2D conv3/Conv2D
0.322 7.14% 55.66% Conv2D conv5/Conv2D
0.310 6.86% 62.53% Conv2D conv1/Conv2D
0.118 2.61% 65.13% Conv2D conv2_1/Conv2D
0.105 2.32% 67.45% MaxPool pool1
Slowest nodes in quantized network :
time average [ms] [%] [cdf%] [Op] [Name]
8.289 47.67% 47.67% QuantizedMatMul fc10/fc10/MatMul_eightbit_quantized_bias_add
5.398 5.33% 53.00% QuantizedConv2D conv5/Conv2D_eightbit_quantized_conv
5.248 5.18% 58.18% QuantizedConv2D conv4/Conv2D_eightbit_quantized_conv
4.981 4.92% 63.10% QuantizedConv2D conv2/Conv2D_eightbit_quantized_conv
4.908 4.85% 67.95% QuantizedConv2D conv3/Conv2D_eightbit_quantized_conv
3.167 3.13% 71.07% QuantizedConv2D conv5_1/Conv2D_eightbit_quantized_conv
3.049 3.01% 74.08% QuantizedConv2D conv4_1/Conv2D_eightbit_quantized_conv
2.973 2.94% 77.02% QuantizedMatMul fc11/MatMul_eightbit_quantized_bias_add
What is the reason for that ?
I'm using tensorflow version compiled from source, without gpu support.
https://github.com/tensorflow/tensorflow/issues/2807
Check the comments here. It seems that quantization isn't yet optimized for x86. My quantized inception resnet v2 runs slower than the original too.