In TensorFlow is min_score_thresh related to IoU? - tensorflow

I am currently investigating object detection using TensorFlow. The minimum score threshold during object detection can be changed and I would like to know if this value is related to the IoU value as it has defaulted to 0.5 (the traditional value used for IoU).

They are different. IoU basically means Intersection over Union. Setting a value of 0.5 (which is usually the default value in most cases) means that the model will only consider those detections that have an IoU of at least 0.5 with the ground-truth box (applied during training and validation).
Detection score or the min score threshold is the confidence with which the model predicts a particular box having an object belonging to a certain class. This is mainly used as a filter during test time to keep only those detections having scores greater than a threshold (0.5 is a good choice for most cases)

Related

Tensorflow & Keras prediction threshold

What is the threshold value that is used by TF by default to classify an input image as being a certain class?
For example, say I have 3 classes 0, 1, 2, and the labels for images are one-hot encoded like so: [1, 0, 0], meaning this image has label of class 0.
Now when a model outputs a prediction after softmax like this one: [0.39, 0.56, 0.05] does TF use 0.5 as the threshold so the class it predicts is class 1?
What if all the predictions were below 0.5 like [0.33, 0.33, 0.33] what would TF say the result is?
And is there any way to specify a new threshold for example 0.7 and ensure TF says that a prediction is wrong if no class prediction is above that threshold?
Also would this logic carry over to the inference stage too where if the network is uncertain of the class then it will refuse to give a classification for the image?
when a model outputs a prediction after softmax like this one: [0.39, 0.56, 0.05] does TF use 0.5 as the threshold so the class it predicts is class 1?
No. There is not any threshold involved here. Tensorflow (and any other framework, for that matter) will just pick up the maximum one (argmax); the result here (class 1) would be the same even if the probabilistic output was [0.33, 0.34, 0.33].
You seem to erroneously believe that a probability value of 0.5 has some special significance in a 3-class classification problem; it has not: a probability value of 0.5 is "special" only in a binary classification setting (and a balanced one, for that matter). In an n-class setting, the respective "special" value is 1/n (here 0.33), and by definition, there will always be some entry in the probability vector greater than or equal to this value.
What if all the predictions were below 0.5 like [0.33, 0.33, 0.33] what would TF say the result is?
As already implied, there is nothing strange or unexpected with all probabilities being below 0.5 in an n-class problem with n>2.
Now, if all the probabilities happen to be equal, as in the example you show (although highly improbable in practice, the question is valid, at least in theory), ideally, such ties should be resolved randomly (i.e. pick a class in random); in practice, since usually this stage is handled by the argmax method of Numpy, the prediction will be the first class (i.e. class 0), which is not difficult to demonstrate:
import numpy as np
x = np.array([0.33, 0.33, 0.33])
np.argmax(x)
# 0
due to how such cases are handled by Numpy - from the argmax docs:
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
To your next question:
is there any way to specify a new threshold for example 0.7 and ensure TF says that a prediction is wrong if no class prediction is above that threshold?
Not in Tensorflow (or any other framework) itself, but this is always something that can be done in a post-processing stage during inference: irrespectively of what is actually returned by your classifier, it is always possible to add some extra logic such that whenever the max probability value is less that a threshold, your system (i.e. your model plus the post-processing logic) returns something like "I don't know / I am not sure / I can't answer". But again, this is external to Tensorflow (or any other framework used) and the model itself, and it can be used only during inference and not during training (in any case, it doesn't make sense during training, because during training only predicted class probabilities are used, and not hard classes).
In fact, we had implemented such a post-processing module in a toy project some years ago, which was an online service to classify dog races from images: when the max probability returned by the model was less than a threshold (which was the case, say, when the model was presented with an image of a cat instead of a dog), the system was programmed to respond with the question "Are you sure this is a dog"?, instead of being forced to make a prediction among the predefined dog races...
the threshold is used in the case of binary classification or multilabel classification, in the case of multi class classification you use argmax, basically the class with the highest activation is your output class, all classes rarely equal each other, if the model is trained well there should be one dominant class

Mean average Precision(what happen if there is no prediction at all)

Mean Average Precision Confusion. (What happen if the model make no predictiona at all)
So , I am trying to understand what mean Average Precision. What I understand so far. First it will check if the predicted box and the ground box overlapped by certain IOU Threshold. Then , it will check the predicted box confidence score and sort them in the confidence order. Then it will calculate the AP at different recall value such as the precision-recall curve. So basically, what I understand is that of all the prediction that the model made, it will sort the prediction box based on the confidence score and calculate the PR curve.
I am using this as a reference to understand mAP (https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52)
But what will happen if the model make no prediction at all? or If the model makes only one prediction and that prediction is correct. Then is is the mAP one in this case? Because we will sort them based on model predicted box and since the model get the predicted the box right the mAP will be one.

Understanding and tracking of metrics in object detection

I have some questions about metrics if I do some training or evaluation on my own dataset. I am still new to this topic and just experimented with tensorflow and googles object detection api and tensorboard...
So I did all this stuff to get things up and running with the object detection api and trained on some images and did some eval on other images.
So I decided to use the weighted PASCAL metrics set for evaluation:
And in tensorboard I get some IoU for every class and also mAP and thats fine to see and now comes the questions.
The IoU gives me the value of how well the overlapping of ground-truth and predictes boxes is and measures the accuracy of my object detector.
First Question: Is there a influencing to IoU if a object with ground-truth is not detected?
Second Question: Is there a influencing of IoU if a ground-truth object is predicted false negativ?
Third Question: What about False Positves where are no ground-truth objects?
Coding Questions:
Fourth Question: Has anyone modified the evaluation workflow from the object detection API to bring in more metrics like accuracy or TP/FP/TN/FN? And if so can provide me some code with explanation or a tutorial you used - that would be awesome!
Fifth Question: If I will monitor some overfitting and take 30% of my 70% traindata and do some evaluation, which parameter shows me that there is some overfitting on my dataset?
Maybe those question are newbie questions or I just have to read and understand more - I dont know - so your help to understand more is appreciated!!
Thanks
Let's start with defining precision with respect to a particular object class: its a proportion of good predictions to all predictions of that class, i.e., its TP / (TP + FP). E.g., if you have dog, cat and bird detector, the dog-precision would be number of correctly marked dogs over all predictions marked as dog (i.e., including false detections).
To calculate the precision, you need to decide if each detected box is TP or FP. To do this you may use IuO measure, i.e., if there is significant (e.g., 50% *) overlap of the detected box with some ground truth box, its TP if both boxes are of the same class, otherwise its FP (if the detection is not matched to any box its also FP).
* thats where the #0.5IUO shortcut comes from, you may have spotted it in the Tensorboard in titles of the graphs with PASCAL metrics.
If the estimator outputs some quality measure (or even probability), you may decide to drop all detections with quality below some threshold. Usually, the estimators are trained to output value between 0 and 1. By changing the threshold you may tune the recall metric of your estimator (the proportion of correctly discovered objects). Lowering the threshold increases the recall (but decreases precision) and vice versa. The average precision (AP) is the average of class predictions calculated over different thresholds, in PASCAL metrics the thresholds are from range [0, 0.1, ... , 1], i.e., its average of precision values for different recall levels. Its an attempt to capture characteristics of the detector in a single number.
The mean average precision is mean of average previsions over all classes. E.g., for our dog, cat, bird detector it would be (dog_AP + cat_AP + bird_AP)/3.
More rigorous definitions could be found in the PASCAL challenge paper, section 4.2.
Regarding your question about overfitting, there could be several indicators of it, one could be, that AP/mAP metrics calculated on the independent test/validation set begin to drop while the loss still decreases.

Choosing initial values for variables and parameters for optimizers in tensorflow

How do people typically choose initial values for their variables and parameters? Do we just tinker till it works?
I was following the Getting Started tutorial for tensorflow, and was able to train the linear model in it. However, I noticed that the starting values for the variables W, b were reasonably close to the ground truth.
When I change the data to make the ground truth values much further away, the gradient descent optimizer gives me NaN values for W, b.
However, in general, I don't think it would be reasonable to be able to guess the initial values of the variables in the model. Seems like I should be able to choose any arbitrary starting point and get to where I want.
I was thinking my choice in my parameters might be bad. However, I am not sure in what way to adjust this. The default was 0.01, I've tried values from 0.001 to 100.
Would there be a discussion of optimization parameter choices and initial values for model variables in a general machine learning book? Really I am just looking for resources.
Thanks!
Some of the famous initializers for Convolutional Neural Networks:
Glorot Normal: Also called Xavier. Normal distribution centered on 0 with stddev = sqrt(2 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor.
http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
Lecun Uniform: Uniform distribution within [-limit, limit] where limit is sqrt(3 / fan_in) where fan_in is the number of input units in the weight tensor.
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
He Normal:
Truncated normal distribution centered on 0 with stddev = sqrt(2 / fan_in) where fan_in is the number of input units in the weight tensor.
http://arxiv.org/abs/1502.01852
Along with these initializers, one have to search for learning rate, momentum and other hyperparameters.

Tensorflow Loss for Non-Independent Classes

I am using a Tensorflow network for classification between classes that are similar to their neighboring classes, i.e. not independent. For example, let's say we want to predict among 10 classes but the predictions are not merely "correct" or "incorrect." Instead, if the correct class is 7 and network predicts 6, the loss should be less than if the network predicted 5, because 6 is closer to the correct answer than 5. My understanding is that cross entropy and 1-hot vectors provides "all or nothing" loss rather than a "continuous" loss that reflects the magnitude of the error. If that is correct, how does one implement such a continuous loss in Tensorflow?
-- Update June 13 2016 ----
An example application might be color recognition. If the network predicts "green" but the true color is yellow-green, then the loss should be less than if the network predicted blue because green is a better prediction than blue.
You can choose to implement a continuous function (e.g. hue from HSV) as a single output, and construct your own loss calculation that reflects what you want to optimize. In that case you'd just have a single output value that ranged between 0.0 and 1.0, and the loss would be evaluated based on the distance from the labeled value.