Losses in Tensorflow - tensorflow

Can anyone kindly explain what basically classification loss and localization loss mean in tensorflow?
I am getting this losses during SSD training procedure using tensorflow API but not understanding both of this two losses at all.
Here I read that localization loss is the loss of the Bounding Box regressor which arises a new question and that is what is bounding box regressor?
Can anyone brief it please?

Hope this helps, I tried to give a brief explanation as I understand it.
what basically classification loss and localization loss mean in tensorflow?
classification /localisation loss values are the result of loss functions and represent the "price paid for inaccuracy of predictions" in the classification/localisation problems (respectively).
The loss value given is a sum of the classification loss and the localisation loss.
The optimisation algorithms are trying to reduce these loss values until your loss sum reaches a point where you are happy with the results and consider your network 'trained'.
You can generally think of loss as a score where 'lower score equals better model'.
what is bounding box regressor?
The bounding box regressor is a trained model to obtain a more accurate bounding box in relation to the ROI in image classification problems I believe.

Related

Regression Loss Function Working Perfectly on My Classification Model

I have built a model that detects what type of shot a table tennis player is performing using TensorFlow. After I built my Neural Network, the model I am dealing with seems to be a multi-label classification model. The binary cross-entropy and categorical cross-entropy gave bad loss and accuracy while using MSE and MAE gave 98% accuracy and 0.004 loss in both cases.
Why is this happening, although I have Supervised Learning data with 3 output labels as shown in the figure below:
The dataset I have collected showing 3 output labels
If your learner has .98 for R squared (of I understand you well), it is likely that you're overfitting and will hence have poor testing predictions. Prediction errors that low are typically symptomatic of overfitting... but honestly, this is likely a better query for cross-validated.

What's the complete loss function used by yolov4?

I am unable to find the explanation for the loss function of yolov4.
First, to understand the YOLOv4 loss, I think you should read about the original YOLO loss that was released in YOLO first paper (https://arxiv.org/abs/1506.02640), you can find it here.
In YOLOv4, you will have the exact same ideas, but with:
Binary cross entropy for the objectness and classification scores,
Box-per-cell level prediction instead of cell level prediction for the class probabilities, so a slightly different penalization for the classification terms,
CIoU Loss instead of MSE for the regression terms (x,y,w,h). CIoU stands for Complete Intersection over Union, and is not so far from the MSE loss. It proposes to compare width and height a bit more interestingly (consistency between aspect ratios), but it keeps the MSE for the comparison between bounding box centers. You can find more details in this paper.
Finally, YOLOv4 loss can be written this way. With the complete CIoU loss terms, it looks like this.

optimizing binary focal loss and dice loss

I am building a Unet image segmentation model with only one foreground and a background (binary segmentation).
For the loss function I sum the dice loss and binary focal loss
I am wondering if it is important to ensure the order of magnitude of dice loss and focal loss to be similar
As you see in the below extract, the binary focal loss is ~ 0.0x and the dice loss is in 0.x. will the loss optimization focus on the dice loss more than the focal loss in this case? Should i be adding a multiplier to the binary focal loss?
I am a newbie to the deep learning paradigm as well. However, according to this paper: https://ieeexplore.ieee.org/abstract/document/9180275/ usually a multiplier must be added for a combo loss. In the paper the combo loss of focal loss and dice loss is calculated using the following equation:
combo loss= β*focalloss - (log (dice loss))
Kindly report your results if you wish to use any other combination of these losses.

What loss function to use in Keras when metric is SparseTopKCategoricalAccuracy/TopKCategoricalAccuracy?

For multiclass classification problems, Keras and tf.keras have metrics like SparseTopKCategoricalAccuracy and TopKCategoricalAccuracy. However, if one uses loss functions like SparseCategoricalCrossentropy or CategoricalCrossentropy, they cannot achieve the max values for these two metrics.
What is a good loss function to use when one wants to maximize SparseTopKCategoricalAccuracy or TopKCategoricalAccuracy?
I understand that SparseTopKCategoricalAccuracy is not differentiable, just like Accuracy. I am trying to find a function that can approximate the smooth loss function and yield a higher number for SparseTopKCategoricalAccuracy.
CrossEntropy is not the best loss function when you deal with Top-k accuracy because cross-entropy may be prone to overfitting on small datasets or noisy labels.
As you have already pointed out, "smooth loss" functions are developed for top-k classification with SVM. To my knowledge, there is no a "off-the-shelf" loss function in Keras/TF that is best suited for top-k. However, I suggest you to try Smooth Surrogate Loss (SSL) presented in the article and implemented in Pytorch to use with deep neural networks (see Github). It derives from multi-class SVMs as SSL creates a margin between the correct top-k predictions and the incorrect ones. The training time of SSL is comparatevely the same as in the case of cross-entropy thanking to a divide-and-conquer approach and the use of polynomials (see implementation).

Tensorflow: loss decreasing, but accuracy stable

My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
update:
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.