threshold to determine Haar Feature - viola-jones

I try now to understand the Viola-Jones algorithm and I get confused about the threshold to determine if a block a Haar feature or not. The intensity of a pixel is in the range of 0->255.
When we have an ideal Haar feature then the delta value is 255 (like the image on the left)
but in a real image, for example on the right, delta is 146, is that a Haar feature?
my question is: which value is the threshold value to determine if a block a Haar feature or not? Or it is simply 255/2=127,5? When delta > 127.5 -> Haar feature, when delta < 127.5 -> no Haar feature?

I would like to point on some details that might help you understand Viola-Jone's algorithm.
According to the original article of Paul Viola and Michael Jones, the algorithm contains 4 main steps:
Haar Feature Selection
Creating an Integral Image
Adaboost Training
Cascading Classifiers
What you describe relates to the third stage. There is no one uniform threshold that determines if a block is a Haar feature or not. Instead, the algorithm uses Adaboost (machine learning algorithm) on training data to determine the desired value of the threshold.
"Stages in the cascade are constructed by training classifiers
using AdaBoost and then adjusting the threshold to
minimize false negatives."(End of page 4 in the above article)
After that in stage 4, the algorithm uses the threshold from stage 3 to classify the image.

Related

The acceptance rate jumps between 0 and 1 drastically in high-dimensional MH algorithm. How can I tune it?

One part of my MCMC algorithm is using MH algorithm to update (n\times 1) vector of parameters $\boldsymbol{\delta}$. I think it is less computational intensive to propose a new sample from a $n\times 1$ multivariate proposal distribution (n is large). However, it seems impossible to tune the acceptance rate towards some ideal interval, such as 0.2 to 0.5.
I have tried random walk update based on multivariate normal and multivariate uniform distribution. No matter how I adjust the step size, the pattern of acceptance rate looks similar to the following figure.
enter image description here
Is there anyone have such experience? Any good suggest is welcome!

How to interpret a confusion matrix in yolov5 for single class?

I used the yolov5 network for object detection on my database which has only one class. But I do not understand the confusion matrix. Why FP is one and TN is zero?
You can take a look at this issue. In short, confusion matrix isn't the best metric for object detection because it all depends on confidence threshold. Even for one class detection try to use mean average precision as the main metric.
We usually use a confusion matrix when working with text data. FP (False Positive) means the model predicted YES while it was actually NO. FN (False Negative) means the model predicted NO while it was actually YES and TN means the model predicted NO while it was actually a NO. In object detection, normally mAP is used (mean Average Precision). Also, you should upload a picture of the confusion matrix, only then the community would be able to guide you on why your FP is one and TN is zero.
The matrix indicates that 100% of the background FPs are caused by a single class, which means detections that do not match with any ground truth label. So that it shows 1. Background FN is for the ground truth objects that can not be detected by the mode, which shows empty or null.

TensorFlow Object Detection API: evaluation mAP behaves weirdly?

I am training an object detector for my own data using Tensorflow Object Detection API. I am following the (great) tutorial by Dat Tran https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9. I am using the provided ssd_mobilenet_v1_coco-model pre-trained model checkpoint as the starting point for the training. I have only one object class.
I exported the trained model, ran it on the evaluation data and looked at the resulted bounding boxes. The trained model worked nicely; I would say that if there was 20 objects, typically there were 13 objects with spot on predicted bounding boxes ("true positives"); 7 where the objects were not detected ("false negatives"); 2 cases where problems occur were two or more objects are close to each other: the bounding boxes get drawn between the objects in some of these cases ("false positives"<-of course, calling these "false positives" etc. is inaccurate, but this is just for me to understand the concept of precision here). There are almost no other "false positives". This seems much better result than what I was hoping to get, and while this kind of visual inspection does not give the actual mAP (which is calculated based on overlap of the predicted and tagged bounding boxes?), I would roughly estimate the mAP as something like 13/(13+2) >80%.
However, when I run the evaluation (eval.py) (on two different evaluation sets), I get the following mAP graph (0.7 smoothed):
mAP during training
This would indicate a huge variation in mAP, and level of about 0.3 at the end of the training, which is way worse than what I would assume based on how well the boundary boxes are drawn when I use the exported output_inference_graph.pb on the evaluation set.
Here is the total loss graph for the training:
total loss during training
My training data consist of 200 images with about 20 labeled objects each (I labeled them using the labelImg app); the images are extracted from a video and the objects are small and kind of blurry. The original image size is 1200x900, so I reduced it to 600x450 for the training data. Evaluation data (which I used both as the evaluation data set for eval.pyand to visually check what the predictions look like) is similar, consists of 50 images with 20 object each, but is still in the original size (the training data is extracted from the first 30 min of the video and evaluation data from the last 30 min).
Question 1: Why is the mAP so low in evaluation when the model appears to work so well? Is it normal for the mAP graph fluctuate so much? I did not touch the default values for how many images the tensorboard uses to draw the graph (I read this question: Tensorflow object detection api validation data size and have some vague idea that there is some default value that can be changed?)
Question 2: Can this be related to different size of the training data and the evaluation data (1200x700 vs 600x450)? If so, should I resize the evaluation data, too? (I did not want to do this as my application uses the original image size, and I want to evaluate how well the model does on that data).
Question 3: Is it a problem to form the training and evaluation data from images where there are multiple tagged objects per image (i.e. surely the evaluation routine compares all the predicted bounding boxes in one image to all the tagged bounding boxes in one image, and not all the predicted boxes in one image to one tagged box which would preduce many "false false positives"?)
(Question 4: it seems to me the model training could have been stopped after around 10000 timesteps were the mAP kind of leveled out, is it now overtrained? it's kind of hard to tell when it fluctuates so much.)
I am a newbie with object detection so I very much appreciate any insight anyone can offer! :)
Question 1: This is the tough one... First, I think you don't understand correctly what mAP is, since your rough calculation is false. Here is, briefly, how it is computed:
For each class of object, using the overlap between the real objects and the detected ones, the detections are tagged as "True positive" or "False positive"; all the real objects with no "True positive" associated to them are labelled "False Negative".
Then, iterate through all your detections (on all images of the dataset) in decreasing order of confidence. Compute the accuracy (TP/(TP+FP)) and recall (TP/(TP+FN)), only counting the detections that you've already seen ( with confidence bigger than the current one) for TP and FP. This gives you a point (acc, recc), that you can put on a precision-recall graph.
Once you've added all possible points to your graph, you compute the area under the curve: this is the Average Precision for this category
if you have multiple categories, the mAP is the standard mean of all APs.
Applying that to your case: in the best case your true positive are the detections with the best confidence. In that case your acc/rec curve will look like a rectangle: you'd have 100% accuracy up to (13/20) recall, and then points with 13/20 recall and <100% accuracy; this gives you mAP=AP(category 1)=13/20=0.65. And this is the best case, you can expect less in practice due to false positives which higher confidence.
Other reasons why yours could be lower:
maybe among the bounding boxes that appear to be good, some are still rejected in the calculations because the overlap between the detection and the real object is not quite big enough. The criterion is that Intersection over Union (IoU) of the two bounding boxes (real one and detection) should be over 0.5. While it seems like a gentle threshold, it's not really; you should probably try and write a script to display the detected bounding boxes with a different color depending on whether they're accepted or not (if not, you'll get both a FP and a FN).
maybe you're only visualizing the first 10 images of the evaluation. If so, change that, for 2 reasons: 1. maybe you're just very lucky on these images, and they're not representative of what follows, just by luck. 2. Actually, more than luck, if these images are the first from the evaluation set, they come right after the end of the training set in your video, so they are probably quite similar to some images in the training set, so they are easier to predict, so they're not representative of your evaluation set.
Question 2: if you have not changed that part in the config file mobilenet_v1_coco-model, all your images (both for training and testing) are rescaled to 300x300 pixels at the start of the network, so your preprocessings don't matter.
Question 3: no it's not a problem at all, all these algorithms were designed to detect multiple objects in images.
Question 4: Given the fluctuations, I'd actually keep training it until you can see improvement or clear overtraining. 10k steps is actually quite small, maybe it's enough because your task is relatively easy, maybe it's not enough and you need to wait ten times that to have significant improvement...

What is meaning of "parameter optimization of SVM by PSO"?

I can change parameters C and epsilon manually to obtain an optimised result, but I found that there is parameter optimization of SVM by PSO (or any other optimization algorithm). There is no algorithm. What does it mean: how can PSO automatically optimize the SVM parameters? I read several papers on this topic, but I'm still not sure.
Particle Swarm Optimization is a technique that uses the ML parameters (SVM parameters, in your case) as its features.
Each "particle" in the swarm is characterized by those parameter values. For instance, you might have initial coordinates of
degree epsilon gamma C
p1 3 0.001 0.25 1.0
p2 3 0.003 0.20 0.9
p3 2 0.0003 0.30 1.2
p4 4 0.010 0.25 0.5
...
pn ...........................
The "fitness" of each particle (p1-p4 shown here out of a population of n particles) is measured by the accuracy of the resulting model: the PSO algorithm trains and tests a model for each particle, returning that model's error rate as the value analogous to that from the training loss function (which it how the value is computed).
On each iteration, particles move toward the fittest neighbours. The process repeats until a maximum (hopefully the global one) appears as a convergence point. This process is simply one from the familiar gradient descent family.
There are two basic PSO variants. In gbest (global best), every particle affects every other particle, sort of a universal gravitation principle. It converges quickly, but may well miss a global max in favor of a local max that happened to be nearer to the swarm's original center. In lbest (local best), a particle responds to only its k closest neighbors. This can form localized clusters; it converges more slowly, but is more likely to find the global max in a non-convex space.
I'll try to briefly explain enough to answer your clarification questions. If that doesn't work, I'm afraid you'll probably have to find someone to discuss this in front of a white board.
To use PSO, you have to decide which SVM parameters you'll try to optimize, and how many particles you want to use. PSO is a meta-algorithm, so its features are the SVM parameters. The PSO parameters are population (how many particles you want to use, update neighbourhood (lbest size and a distance function; gbest is the all-inclusive case), and velocity (learning rate for the SVM parameters).
For a bit of illustration, let's assume the particle table above, extended to a population of 20 particles. We'll use lbest with a neighbourhood of 4, and a velocity of 0.1. We choose (randomly, in a grid, or however we think might give us nice results) the initial values of degree, epsilon, gamma, and C for each of the 20 particles.
Each iteration of PSO works like this:
# Train the model described by each particle's "position"
For each of the 20 particles:
Train an SVM with the SVM input and the given parameters.
Test the SVM; return the error rate as the PSO loss function value.
# Update the particle positions
for each of the 20 particles:
find the nearest 4 neighbours (using the PSO distance function)
identify the neighbour with the lowest loss (SVM's error rate).
adjust this particle's features (degree, epsilon, gamma, C) 0.1 of the way toward that neighbour's features. 0.1 is our learning rate / velocity. (Yes, I realize that changing degree is not likely to happen (it's a discrete value) without a special case in the update routine.
Continue iterating through PSO until the particles have converged to your liking.
gbest is simply lbest with an infinite neighbourhood; in that case, you don't need a distance function on the particle space.

How to decide the step size when using Metropolis–Hastings algorithm

I have a simple question regarding to Metropolis–Hastings algorithm.
Suppose the distribution only has one variable x and the value range of x is s=[-2^31,2^31].
In the sampling process, I need to propose a new value of x and then decide whether to accept it.
x_{t+1} =x_t+\epsilon
If I want to implement it by myself, how to decide the value of \epsilon.
The basic solution is to pick a value from Uniform[-2^31,2^31] and set it to \epsilon. What if the value range is unbounded like [-inf, inf]?
How does the current MCMC library (e.g. pymc) solve that problem?
Suppose you have $d$ dimensional parameters, the optimal scale is approximate $2.4d^(−1/2)$ times the scale of the target distribution, which implies optimal acceptance rates of 0.44 for $d = 1$ and 0.23 for $d$ goes to \infinity.
reference: Automatic Step Size Selection in Random Walk Metropolis Algorithms,
Todd L. Graves, 2011.
The best approach is to code a self-tuning algorithm that starts with an arbitrary variance for the step size variance, and tune this variance as the algorithm progresses. You are shooting for an acceptance rate of 25-50% for the Metropolis algorithm.