Mean Average Precision Confusion. (What happen if the model make no predictiona at all)
So , I am trying to understand what mean Average Precision. What I understand so far. First it will check if the predicted box and the ground box overlapped by certain IOU Threshold. Then , it will check the predicted box confidence score and sort them in the confidence order. Then it will calculate the AP at different recall value such as the precision-recall curve. So basically, what I understand is that of all the prediction that the model made, it will sort the prediction box based on the confidence score and calculate the PR curve.
I am using this as a reference to understand mAP (https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52)
But what will happen if the model make no prediction at all? or If the model makes only one prediction and that prediction is correct. Then is is the mAP one in this case? Because we will sort them based on model predicted box and since the model get the predicted the box right the mAP will be one.
Related
I'm having trouble understanding the added value of calculating AUC of training sets in general but for this question i'm using an example with PLS-DA.
Let's say you've built a PLS-DA model to try and see whether this model can distinguish between patients with diabetes and patients without. After this, the plot and visualisation of the model shows that there is some kind of discriminatory power. Mind you, this PLS-DA model is built on ONLY trainingdata/ trainig set.
In this situation, what is the added value of using ROC curve to calculate the AUC?
And let's say you plot ROC curve and calculate an AUC of 0,9. What does this explicitly mean? I'm tempted that this would mean that this model is able to/ has the potential to distinguish between, people with diabetes and people without diabetes with an accuracy of 90%. But something tells me this isn't right because after all; the performance of my model can ONLY be assessed after plotting ROC curve and calculating AUC of a validation set and test set right? Or am I looking at this in the wrong way?
I am currently investigating object detection using TensorFlow. The minimum score threshold during object detection can be changed and I would like to know if this value is related to the IoU value as it has defaulted to 0.5 (the traditional value used for IoU).
They are different. IoU basically means Intersection over Union. Setting a value of 0.5 (which is usually the default value in most cases) means that the model will only consider those detections that have an IoU of at least 0.5 with the ground-truth box (applied during training and validation).
Detection score or the min score threshold is the confidence with which the model predicts a particular box having an object belonging to a certain class. This is mainly used as a filter during test time to keep only those detections having scores greater than a threshold (0.5 is a good choice for most cases)
I am working on a project for price movement forecasting and I am stuck with poor quality predictions.
At every time-step I am using an LSTM to predict the next 10 time-steps. The input is the sequence of the last 45-60 observations. I tested several different ideas, but they all seems to give similar results. The model is trained to minimize MSE.
For each idea I tried a model predicting 1 step at a time where each prediction is fed back as an input for the next prediction, and a model directly predicting the next 10 steps(multiple outputs). For each idea I also tried using as input just the moving average of the previous prices, and extending the input to input the order book at those time-steps.
Each time-step corresponds to a second.
These are the results so far:
1- The first attempt was using as input the moving average of the last N steps, and predict the moving average of the next 10.
At time t, I use the ground truth value of the price and use the model to predict t+1....t+10
This is the result
Predicting moving average
On closer inspection we can see what's going wrong:
Prediction seems to be a flat line. Does not care much about the input data.
2) The second attempt was trying to predict differences, instead of simply the price movement. The input this time instead of simply being X[t] (where X is my input matrix) would be X[t]-X[t-1].
This did not really help.
The plot this time looks like this:
Predicting differences
But on close inspection, when plotting the differences, the predictions are always basically 0.
Plot of differences
At this point, I am stuck here and running our of ideas to try. I was hoping someone with more experience in this type of data could point me in the right direction.
Am I using the right objective to train the model? Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw? (They do incur in low error, but they become meaningless at that point).
At least just a hint on where to dig for further info would be highly appreciated.
Thanks!
Am I using the right objective to train the model?
Yes, but LSTM are always very tricky for forecasting time series. And are very prone to overfitting compared to other time series models.
Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw?
I haven't seen your code, or the details of the LSTM you are using. Make sure you are using a very small network, and you are avoiding overfitting. Make sure that after you differenced the data - you then reintegrate it before evaluating the final forecast.
On trick to try to build a model that forecasts 10 steps ahead directly instead of building a one-step ahead model and then forecasting recursively.
Can I extract the confidence values or variance in prediction error from a tensorflow regressor? e.g. if the model gives a prediction x, then can I know the confidence band, like is x in +-25% range of the actual value?
I'm afraid it's not as easy as when using sofmax in the output layer. As said in here you can use the MSE of the NN on the validation as an estimate for variance, then use your desired value of confidence. Be aware that this approach assumes a lot of things (ie. distribution of errors is allways the same which may not be true) so if you really need those confidence intervals, a regression NN is not the best fit for you.
I have some questions about metrics if I do some training or evaluation on my own dataset. I am still new to this topic and just experimented with tensorflow and googles object detection api and tensorboard...
So I did all this stuff to get things up and running with the object detection api and trained on some images and did some eval on other images.
So I decided to use the weighted PASCAL metrics set for evaluation:
And in tensorboard I get some IoU for every class and also mAP and thats fine to see and now comes the questions.
The IoU gives me the value of how well the overlapping of ground-truth and predictes boxes is and measures the accuracy of my object detector.
First Question: Is there a influencing to IoU if a object with ground-truth is not detected?
Second Question: Is there a influencing of IoU if a ground-truth object is predicted false negativ?
Third Question: What about False Positves where are no ground-truth objects?
Coding Questions:
Fourth Question: Has anyone modified the evaluation workflow from the object detection API to bring in more metrics like accuracy or TP/FP/TN/FN? And if so can provide me some code with explanation or a tutorial you used - that would be awesome!
Fifth Question: If I will monitor some overfitting and take 30% of my 70% traindata and do some evaluation, which parameter shows me that there is some overfitting on my dataset?
Maybe those question are newbie questions or I just have to read and understand more - I dont know - so your help to understand more is appreciated!!
Thanks
Let's start with defining precision with respect to a particular object class: its a proportion of good predictions to all predictions of that class, i.e., its TP / (TP + FP). E.g., if you have dog, cat and bird detector, the dog-precision would be number of correctly marked dogs over all predictions marked as dog (i.e., including false detections).
To calculate the precision, you need to decide if each detected box is TP or FP. To do this you may use IuO measure, i.e., if there is significant (e.g., 50% *) overlap of the detected box with some ground truth box, its TP if both boxes are of the same class, otherwise its FP (if the detection is not matched to any box its also FP).
* thats where the #0.5IUO shortcut comes from, you may have spotted it in the Tensorboard in titles of the graphs with PASCAL metrics.
If the estimator outputs some quality measure (or even probability), you may decide to drop all detections with quality below some threshold. Usually, the estimators are trained to output value between 0 and 1. By changing the threshold you may tune the recall metric of your estimator (the proportion of correctly discovered objects). Lowering the threshold increases the recall (but decreases precision) and vice versa. The average precision (AP) is the average of class predictions calculated over different thresholds, in PASCAL metrics the thresholds are from range [0, 0.1, ... , 1], i.e., its average of precision values for different recall levels. Its an attempt to capture characteristics of the detector in a single number.
The mean average precision is mean of average previsions over all classes. E.g., for our dog, cat, bird detector it would be (dog_AP + cat_AP + bird_AP)/3.
More rigorous definitions could be found in the PASCAL challenge paper, section 4.2.
Regarding your question about overfitting, there could be several indicators of it, one could be, that AP/mAP metrics calculated on the independent test/validation set begin to drop while the loss still decreases.