Can modify object detection true positive define(IOU)? - object-detection

image1
I met a problem about object detection metric,like image1,
The red boxes are ground truth, the black box is predict box, and the prediction class is correct.
If I use IOU method, all prediction boxes output false positive! But in my case, I hope to predict true positive.
Does any method can solve this problem? By the way, Don't consider modifying ground truth boxes.

Related

Can someone give me an explanation for Multibox loss function?

I have found some expression for SSD Multibox-loss function as follows:
multibox_loss = confidence_loss + alpha * location_loss
Can someone explains what are the explanations for those terms?
SSD Multibox (short for Single Shot Multibox Detector) is a neural network that can detect and locate objects in an image in a single forward pass. The network is trained in a supervised manner on a dataset of images where a bounding box and a class label is given for each object of interest. The loss term
multibox_loss = confidence_loss + alpha * location_loss
is made up of two parts:
Confidence loss is a categorical cross-entropy loss for classifying the detected objects. The purpose of this term is to make sure that correct label is assigned to each detected object.
Location loss is a regression loss (either the smooth L1 or the L2 loss) on the parameters (width, height and corner offset) of the detected bounding box. The purpose of this term is to make sure that the correct region of the image is identified for the detected objects. The alpha term is a hyper parameter used to scale the location loss.
The precise formulation of the loss is given in Equation 1 of the SSD: Single Shot MultiBox Detector paper.

How to to drop a specific labeled pixels in semantic segmentation

I am new to semantic segmentation. I used the FCN to train my dataset. In the data set there are some pixels for the unknown class. I would like to exclude this class from my loss. So I defined a weight based on the class distribution of whole dataset and set the weight for the unknown class to zero as following. But I am still getting prediction for this class. Do you have any idea how to properly exclude one specific class?
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits
(logits=logits, labels=tf.squeeze(annotation,
squeeze_dims=[3]),name="entropy"))
weighted_losses = (loss * weights)
train_op = optimizer.minimize(weighted_losses,
var_list=tf.trainable_variables(),
global_step=tf.train.get_global_step())
I do not know pytorch, but I heard that there is some thing for this purpose "ignore_index" in loss function and you can ignore a specific class. If this is right approach to my problem, do you know if there is some thing equivalent in tensorflow?
For semantic segmentation you have 2 "special" labels: the one is "background" (usually 0), and the other one is "ignore" (usually 255 or -1).
"Background" is like all other semantic labels meaning "I know this pixel does not belong to any of the semantic categories I am working with". It is important for your model to correctly output "background" whenever applicable.
"Ignore" label is not a label that your model can predict - it is "outside" its range. This label only exists in the training annotation meaning "we were unsure how this pixel should be labeled, so just ignore it".
When there are "ignore" pixels in your target labels, your model cannot (and should not) output "ignore" labels. Nevertheless, your model should output something. The fact that this pixel is labeled "ignore" means that whatever your model outputs for that pixel will be ignored by the loss function (assuming you told the loss to ignore "ignore" pixels). Moreover, if your test/validation sets have "ignore" labels means that whatever your model outputs for these pixels, it would simply be ignored by the scoring mechanism and won't be counted as either a correct or incorrect prediction.
To summarize: even when the ground truth has "ignore" labels, the model cannot and should not output "ignore". It simply outputs whatever valid label it feels like and it is perfectly okay.
for tensorflow you can checkout this thread.

Improving training accuracy on Resnet

I am training an object detector using mxnet/resnet50
After the last training run the mAP was 78%, and the loss was 0.37
When I run the detector on my test set (independent of train/val data)
I am getting false positives result - with some rather high 30-60% confidence levels. I think I need to add some train/val images that do not have ANY of the objects i'm training the detector for.
I'm planning on adding about 20% more images that have a label of -1 -- which I read somewhere is how you designate an image having no label in mxnet.
Does this seem reasonable? is -1 the right way to designate it? any downside?
Thanks,
john
One method for an unbalanced object detection task is to have a classifier before the object detection stage, which determines if the image contains an object or not. You can weight the loss for each class in this classifier relative to its inverse frequency (i.e. higher weight for classes that appear less frequently). You should test on data with a similar class balance as the real world. You might find this post useful.

Understanding and tracking of metrics in object detection

I have some questions about metrics if I do some training or evaluation on my own dataset. I am still new to this topic and just experimented with tensorflow and googles object detection api and tensorboard...
So I did all this stuff to get things up and running with the object detection api and trained on some images and did some eval on other images.
So I decided to use the weighted PASCAL metrics set for evaluation:
And in tensorboard I get some IoU for every class and also mAP and thats fine to see and now comes the questions.
The IoU gives me the value of how well the overlapping of ground-truth and predictes boxes is and measures the accuracy of my object detector.
First Question: Is there a influencing to IoU if a object with ground-truth is not detected?
Second Question: Is there a influencing of IoU if a ground-truth object is predicted false negativ?
Third Question: What about False Positves where are no ground-truth objects?
Coding Questions:
Fourth Question: Has anyone modified the evaluation workflow from the object detection API to bring in more metrics like accuracy or TP/FP/TN/FN? And if so can provide me some code with explanation or a tutorial you used - that would be awesome!
Fifth Question: If I will monitor some overfitting and take 30% of my 70% traindata and do some evaluation, which parameter shows me that there is some overfitting on my dataset?
Maybe those question are newbie questions or I just have to read and understand more - I dont know - so your help to understand more is appreciated!!
Thanks
Let's start with defining precision with respect to a particular object class: its a proportion of good predictions to all predictions of that class, i.e., its TP / (TP + FP). E.g., if you have dog, cat and bird detector, the dog-precision would be number of correctly marked dogs over all predictions marked as dog (i.e., including false detections).
To calculate the precision, you need to decide if each detected box is TP or FP. To do this you may use IuO measure, i.e., if there is significant (e.g., 50% *) overlap of the detected box with some ground truth box, its TP if both boxes are of the same class, otherwise its FP (if the detection is not matched to any box its also FP).
* thats where the #0.5IUO shortcut comes from, you may have spotted it in the Tensorboard in titles of the graphs with PASCAL metrics.
If the estimator outputs some quality measure (or even probability), you may decide to drop all detections with quality below some threshold. Usually, the estimators are trained to output value between 0 and 1. By changing the threshold you may tune the recall metric of your estimator (the proportion of correctly discovered objects). Lowering the threshold increases the recall (but decreases precision) and vice versa. The average precision (AP) is the average of class predictions calculated over different thresholds, in PASCAL metrics the thresholds are from range [0, 0.1, ... , 1], i.e., its average of precision values for different recall levels. Its an attempt to capture characteristics of the detector in a single number.
The mean average precision is mean of average previsions over all classes. E.g., for our dog, cat, bird detector it would be (dog_AP + cat_AP + bird_AP)/3.
More rigorous definitions could be found in the PASCAL challenge paper, section 4.2.
Regarding your question about overfitting, there could be several indicators of it, one could be, that AP/mAP metrics calculated on the independent test/validation set begin to drop while the loss still decreases.

Multilabel image classification with sparse labels in TensorFlow?

I want to perform a multilabel image classification task for n classes.
I've got sparse label vectors for each image and each dimension of each label vector is currently encoded in this way:
1.0 ->Label true / Image belongs to this class
-1.0 ->Label false / Image does not contain to this class.
0.0 ->missing value/label
E.g.: V= {1.0,-1.0,1.0, 0.0}
For this example V the model should learn, that the corresponding image should be classified in the first and third class.
My problem is currently how to handle the missing values/labels. I've searched through the issues and found this issue:
tensorflow/skflow#113 found here
So could do multilable image classification with:
tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)
but TensorFlow has this error function for sparse softmax, which is used for exclusive classification:
tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels, name=None)
So is there something like sparse sigmoid cross entropy? (Couldn't find something) or any suggestions how can I handle my multilabel classification problem with sparse labels.
I used weighted_cross_entropy_with_logits as the loss function with positive weights for 1s.
In my case, all the labels are equally important. But 0 was ten times more likely to be appeared as the value of any label than 1.
So I weighed all the 1s by calling the pos_weight parameter of the aforementioned loss function. I used a pos_weight (= weight on positive values) of 10. By the way, I do not recommend any strategy to calculate the pos_weight. I think it will depend explicitly on the data in hand.
if real label = 1,
weighted_cross_entropy = pos_weight * sigmoid_cross_entropy
Weighted cross entropy with logits is same as the Sigmoid cross entropy with logits, except for the extra weight value multiplied to all the targets with a positive real value i.e.; 1.
Theoretically, it should do the job. I am still tuning other parameters to optimize the performance. Will update with performance statistics later.
First I would like to know what you mean by missing data? What is the difference between miss and false in your case?
Next, I think it is wrong that you represent your data like this. You have unrelated information that you try to represent on the same dimension. (If it was false or true it would work)
What seems to me better is to represent for each of your class a probability if it is good, or is missing or is false.
In your case V = [(1,0,0),(0,0,1),(1,0,0),(0,1,0)]
Ok!
So your problem is more about how to handle the missing data I think.
So I think you should definitely use tf.sigmoid_cross_entropy_with_logits()
Just change the target for the missing data to 0.5. (0 for false and 1 for true).
I never tried this approach but it should let your network learn without biasing it too much.