I want to know the size of bounding box in object-detection api - tensorflow

I have used the API
And then,
How would I know the length of bounding box?
I have used Tutorial IPython notebook on github in real-time.
But I don't know use which command to calculate the length of boxes.

Just to extend Beta's answer:
You can get the predicted bounding boxes from the detection graph. An example for this is given in the Tutorial IPython notebook on github. This is where Beta's code snipped comes from. Access the detection_graph and extract the coordinates of the predicted bounding boxes from the tensor:
By calling np.squeeze(boxes) you reshape them to (m, 4), where m denotes the amount of predicted boxes. You can now access the boxes and compute the length, area or what ever you want.
But remember that the predicted box coordinates are normalized! They are in the following order:
[ymin, xmin, ymax, xmax]
So computing the length in pixel would be something like:
def length_of_bounding_box(bbox):
return bbox[3]*IMG_WIDTH - bbox[1]*IMG_WIDTH

I wrote a full answer on how to find the bounding box coordinates here and thought it might be useful to someone on this thread too.
Google Object Detection API returns bounding boxes in the format [ymin, xmin, ymax, xmax] and in normalised form (full explanation here). To find the (x,y) pixel coordinates we need to multiply the results by width and height of the image. First get the width and height of your image:
width, height = image.size
Then, extract ymin,xmin,ymax,xmax from the boxes object and multiply to get the (x,y) coordinates:
ymin = boxes[0][i][0]*height
xmin = boxes[0][i][1]*width
ymax = boxes[0][i][2]*height
xmax = boxes[0][i][3]*width
Finally print the coordinates of the box corners:
print 'Top left'
print (xmin,ymin,)
print 'Bottom right'
print (xmax,ymax)

You can call boxes, like the following:
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
similarly for scores, and classes.
Then just call them in session run.
(boxes, scores, classes) = sess.run(
[boxes, scores, classes],
feed_dict={image_tensor: imageFile})

Basically, you can get all those from the graph
image_tensor = graph.get_tensor_by_name('image_tensor:0')
boxes = graph.get_tensor_by_name('detection_boxes:0')
scores = graph.get_tensor_by_name('detection_scores:0')
classes = graph.get_tensor_by_name('detection_classes:0')
num_detections = graph.get_tensor_by_name('num_detections:0')
and boxes[0] contains all predicted bounding box coordinate in format of [top_left_x, top_left_y, bottom_right_x, bottom_right_y], which is what you are looking for.
Check out this repo and you may find more details:

The following code that recognizes objects and returns the information for the locations and confidence is:
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
To iterate through the boxes
for i,b in enumerate(boxes[0]):
To get width and height:
width = boxes[0][i][1]+boxes[0][i][3]
height = boxes[0][i][0]+boxes[0][i][2]
You can find more details: [https://pythonprogramming.net/detecting-distances-self-driving-car/]


Anchor boxes and offsets in SSD object detection

How do you calculate anchor box offsets for object detection in SSD? As far as I understood anchor boxes are the boxes in 8x8 feature map, 4x4 feature map, or any other feature map in the output layer.
So what are the offsets?
Is it the distance between the centre of the bounding box and the centre of a particular box in say a 4x4 feature map?
If I am using a 4x4 feature map as my output, then my output should be of the dimension:
(4x4, n_classes + 4)
where 4 is for my anchor box co-ordinates.
This 4 co-ordinates can be something like:
(xmin, xmax, ymin, ymax)
This will correspond to the top-left and bottom-right corners of the bounding box.
So why do we need offsets and if so how do we calculate them?
Any help would be really appreciated!
We need offsets because thats what we calculate when we default anchor boxes, In case of ssd for every feature map cell they will have predefined number of anchor boxes of different scale ratios on very feature map cell,I think in the paper this number is 6.
Now because this is a detection problem ,we will also have ground truth bounding boxes,Here roughly, we compare the IOU of the anchor box to the GT box and if it is greater than a threshold say 0.5 we predict the box offsets to that anchor box.

How can I train Mask-RCNN for a training data having no bounding boxes?

I want to train Mask-RCNN on NYU RGBD dataset, it doesn't contain bounding boxes. I can create myself bounding boxes for the training data but task is too much hectic that too I want to test only the instance segmentation on the input data.So how can I train on training data having no bounding boxes??
Simply you cannot. Object detection algorithms require bounding boxes for training.
yes you can but not directly you have to cerrer bbox, it is enough to have binary mask [0,1] calculate the length and width of the max (with numpy it's easy) after having the middle coodoner (i, j ) of the mask length / 2 width / 2 and locate the middle of the image on the mask with a deduction of the position with the pixels of the mask.
You can extract the bounding boxes from the instance masks. In the official MaskRCNN code repository, they provided a code for that here.
In short, for a given mask m of shape [h, w]:
box = np.zeros([4], dtype=np.int32)
# Bounding box.
horizontal_indicies = np.where(np.any(m, axis=0))[0]
vertical_indicies = np.where(np.any(m, axis=1))[0]
x1, x2 = horizontal_indicies[[0, -1]]
y1, y2 = vertical_indicies[[0, -1]]
# x2 and y2 should not be part of the box. Increment by 1.
x2 += 1
y2 += 1
box = np.array([y1, x1, y2, x2])

How to print bounding box location in Object Detection API

I'm using tensorflow and object detection api. I want to print bounding box location of object in test image. And this line show you vis_util.visualize_boxes_and_labels_on_image_array :
I want to look inside 'boxes' because I guess 'boxes' maybe is store bounding box location. So, I convert 'boxes' to list but when I print it on cmd, it's too complex.
Any ideas for this ?
I found the sollution.
1) After the line of code you quoted, write print(boxes).
2) It will return a [N,4] array, where N is the number of objects that was detected, so every row is a detected objected with a unique detection score.
3) Each row has 4 columns which represent normalized [ymin, xmin, ymax, xmax] in the descending order of detection score, that is.
The first row returns you the bounding box co-ordinates of the object that was detected with the highest score.
The second row returns the bounding box co-ordinate of the object with the second highest score and so on.
4) To get the exact co-ordinate, multiply xmax and xmin with the upper limit of the x axis of your image and multiply ymax and ymin with the same of y axis.

How to choose coordinates of bounding boxes for object detection from tensorflow

I'm trying to use object_detection from tensorflow library to detect colored squares. For every image in train-eval-dataset, I should have the information about bounding box coordinates (with origin in top left corner) defined by 4 floating point numbers [ymin, xmin, ymax, xmax]. Now, let's suppose background_image is completly white image 300 x 300px. Code of my image-generator looks like this (pseudocode):
new_image = background_image.copy()
rand_x, rand_y = random_coordinates(new_image)
for (i = rand_x; i < rand_y + 100; ++i)
for (j = rand_y; j < rand_y + 100; ++j)
new_image[i][j] = color(red)
...so now we have 300 x 300px image of red square 100 x 100px on white background. The question is - should my bounding box contain only red colored pixels [rand_x, rand_y, rand_x + 100, rand_y + 100] or should it contain "white frame" like [rand_x - 5, rand_y - 5, rand_x + 105, rand_y + 105]? And maybe it does not matter? After 15h of training and evaluating (with bounding box coordinates = [rand_x, rand_y, rand_x + 100, rand_y + 100]) tensorboard shows me something like this:
Tensorboard informs that precission is about 0.1.
I understand well that after only 1100 steps results should not be breathtaking. I just want to exclude potential inaccuracies resulting from my fault.
Ideally, you want that your predicted boxes perfectly overlap the ground truth boxes.
This means that if A = [y_min, x_min, y_max, x_max] is the ground truth box, you want B (the predicted box) to be equal to A => A=B.
During the train phase is perfectly normal that your predictions are "around" the ground truth and there's no perfect match.
In reality, even during the test phase (at the end of the train) A=B is something difficult to achieve, because every classifier/regressor is not perfect.
In short: your predictions looks fine. With more epochs of train you'll probably get some better results

OpenCV detect blobs on the image

I need to find (and draw rect around)/get max and min radius blobs on the image. (samples below)
the problem is to find correct filters for the image that will allow Canny or Threshold transformation to highlight the blobs. then I going to use findContours to find the rectangles.
I've tryed:
Threshold - with different level
change image tone with variety of "lines"
and ect. the better result was to detect piece (20-30%) of blob. and this info not allowed to draw rect around blob. also, thanks for shadows, not related to blob dots were detected, so that also prevents to detect the area.
as I understand I need to find counter that has hard contrast (not smooth like in shadow). Is there any way to do that with openCV?
cases separately: image 1, image 2, image 3, image 4, image 5, image 6, image 7, image 8, image 9, image 10, image 11, image 12
One more Update
I believe that the blob have the contrast area at the edge. So, I've tried to make edge stronger: I've created 2 gray scale Mat: A and B, apply Gaussian blur for the second one - B (to reduce noise a bit), then I've made some calculations: goes around every pixel and find max difference between Xi,Yi of 'A' and nearby dots from 'B':
and apply max difference to Xi,Yi. so I get smth like this:
is i'm on the right way? btw, can I reach smth like this via OpenCV methods?
Update Image Denoising helps to reduce noize, Sobel - to highlight the contours, then threshold + findContours and custome convexHull gets smth similar I'm looking for but it not good for some blobs.
Since there are big differences between the input images, the algorithm should be able to adapt to the situation. Since Canny is based on detecting high frequencies, my algorithm treats the sharpness of the image as the parameter used for preprocessing adaptation. I didn't want to spend a week figuring out the functions for all the data, so I applied a simple, linear function based on 2 images and then tested with a third one. Here are my results:
Have in mind that this is a very basic approach and is only proving a point. It will need experiments, tests, and refining. The idea is to use Sobel and sum over all the pixels acquired. That, divided by the size of the image, should give you a basic estimation of high freq. response of the image. Now, experimentally, I found values of clipLimit for CLAHE filter that work in 2 test cases and found a linear function connecting the high freq. response of the input with a CLAHE filter, yielding good results.
sobel = get_sobel(img)
clip_limit = (-2.556) * np.sum(sobel)/(img.shape[0] * img.shape[1]) + 26.557
That's the adaptive part. Now for the contours. It took me a while to figure out a correct way of filtering out the noise. I settled for a simple trick: using contours finding twice. First I use it to filter out the unnecessary, noisy contours. Then I continue with some morphological magic to end up with correct blobs for the objects being detected (more details in the code). The final step is to filter bounding rectangles based on the calculated mean, since, on all of the samples, the blobs are of relatively similar size.
import cv2
import numpy as np
def unsharp_mask(img, blur_size = (5,5), imgWeight = 1.5, gaussianWeight = -0.5):
gaussian = cv2.GaussianBlur(img, (5,5), 0)
return cv2.addWeighted(img, imgWeight, gaussian, gaussianWeight, 0)
def smoother_edges(img, first_blur_size, second_blur_size = (5,5), imgWeight = 1.5, gaussianWeight = -0.5):
img = cv2.GaussianBlur(img, first_blur_size, 0)
return unsharp_mask(img, second_blur_size, imgWeight, gaussianWeight)
def close_image(img, size = (5,5)):
kernel = np.ones(size, np.uint8)
return cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
def open_image(img, size = (5,5)):
kernel = np.ones(size, np.uint8)
return cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
def shrink_rect(rect, scale = 0.8):
center, (width, height), angle = rect
width = width * scale
height = height * scale
rect = center, (width, height), angle
return rect
def clahe(img, clip_limit = 2.0):
clahe = cv2.createCLAHE(clipLimit=clip_limit, tileGridSize=(5,5))
return clahe.apply(img)
def get_sobel(img, size = -1):
sobelx64f = cv2.Sobel(img,cv2.CV_64F,2,0,size)
abs_sobel64f = np.absolute(sobelx64f)
return np.uint8(abs_sobel64f)
img = cv2.imread("blobs4.jpg")
# save color copy for visualizing
imgc = img.copy()
# resize image to make the analytics easier (a form of filtering)
resize_times = 5
img = cv2.resize(img, None, img, fx = 1 / resize_times, fy = 1 / resize_times)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# use sobel operator to evaluate high frequencies
sobel = get_sobel(img)
# experimentally calculated function - needs refining
clip_limit = (-2.556) * np.sum(sobel)/(img.shape[0] * img.shape[1]) + 26.557
# don't apply clahe if there is enough high freq to find blobs
if(clip_limit < 1.0):
clip_limit = 0.1
# limit clahe if there's not enough details - needs more tests
if(clip_limit > 8.0):
clip_limit = 8
# apply clahe and unsharp mask to improve high frequencies as much as possible
img = clahe(img, clip_limit)
img = unsharp_mask(img)
# filter the image to ensure edge continuity and perform Canny
# (values selected experimentally, using trackbars)
img_blurred = (cv2.GaussianBlur(img.copy(), (2*2+1,2*2+1), 0))
canny = cv2.Canny(img_blurred, 35, 95)
# find first contours
_, cnts, _ = cv2.findContours(canny.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# prepare black image to draw contours
canvas = np.ones(img.shape, np.uint8)
for c in cnts:
l = cv2.arcLength(c, False)
x,y,w,h = cv2.boundingRect(c)
aspect_ratio = float(w)/h
# filter "bad" contours (values selected experimentally)
if l > 500:
if l < 20:
if aspect_ratio < 0.2:
if aspect_ratio > 5:
if l > 150 and (aspect_ratio > 10 or aspect_ratio < 0.1):
# draw all the other contours
cv2.drawContours(canvas, [c], -1, (255, 255, 255), 2)
# perform closing and blurring, to close the gaps
canvas = close_image(canvas, (7,7))
img_blurred = cv2.GaussianBlur(canvas, (8*2+1,8*2+1), 0)
# smooth the edges a bit to make sure canny will find continuous edges
img_blurred = smoother_edges(img_blurred, (9,9))
kernel = np.ones((3,3), np.uint8)
# erode to make sure separate blobs are not touching each other
eroded = cv2.erode(img_blurred, kernel)
# perform necessary thresholding before Canny
_, im_th = cv2.threshold(eroded, 50, 255, cv2.THRESH_BINARY)
canny = cv2.Canny(im_th, 11, 33)
# find contours again. this time mostly the right ones
_, cnts, _ = cv2.findContours(canny.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# calculate the mean area of the contours' bounding rectangles
sum_area = 0
rect_list = []
for i,c in enumerate(cnts):
rect = cv2.minAreaRect(c)
_, (width, height), _ = rect
area = width*height
sum_area += area
mean_area = sum_area / len(cnts)
# choose only rectangles that fulfill requirement:
# area > mean_area*0.6
for rect in rect_list:
_, (width, height), _ = rect
box = cv2.boxPoints(rect)
box = np.int0(box * 5)
area = width * height
if(area > mean_area*0.6):
# shrink the rectangles, since the shadows and reflections
# make the resulting rectangle a bit bigger
# the value was guessed - might need refinig
rect = shrink_rect(rect, 0.8)
box = cv2.boxPoints(rect)
box = np.int0(box * resize_times)
cv2.drawContours(imgc, [box], 0, (0,255,0),1)
# resize for visualizing purposes
imgc = cv2.resize(imgc, None, imgc, fx = 0.5, fy = 0.5)
cv2.imshow("imgc", imgc)
cv2.imwrite("result3.png", imgc)
Overall I think that's a very interesting problem, a little bit too big to be answered here. The approach I presented is due to be treated as a road sign, not a complete solution. Tha basic idea being:
Adaptive preprocessing.
Finding contours twice: for filtering and then for the actual classification.
Filtering the blobs based on their mean size.
Thanks for the fun and good luck!
Here is the code I used:
import cv2
from sympy import Point, Ellipse
import numpy as np
image = cv2.imread(x1,0)
image1 = cv2.imread(x1,1)
median = cv2.GaussianBlur(image,(9,9),0)
median1 = cv2.GaussianBlur(image,(21,21),0)
ret,thresh1 = cv2.threshold(c,12,255,cv2.THRESH_BINARY)
dilation = cv2.dilate(thresh1,kernel,iterations = 1)
opening = cv2.morphologyEx(dilation, cv2.MORPH_OPEN, kernel)
ret,contours,hierarchy = cv2.findContours(opening,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
for u in range(0,c-1):
if (np.size(contours[u])>200):
ellipse = cv2.fitEllipse(contours[u])
(center,axes,orientation) =ellipse
majoraxis_length = max(axes)
minoraxis_length = min(axes)
if (eccentricity<0.8):
cv2.drawContours(image1, contours, u, (255,1,255), 3)
Here problem is to find a near circular object. This simple solution is based on finding the eccentricity for each and every contour. Such objects being detected is the drop of water.
I have a partial solution in place.
I initially converted the image to the HSV color space and tinkered with the value channel. On doing so I came across something unique. In almost every image, the droplets have a tiny reflection of light. This was highlighted distinctly in the value channel.
Upon inverting this I was able to obtain the following:
Sample 1:
Sample 2:
Sample 3:
Now we have to extract the location of those points. To do so I performed anomaly detection on the inverted value channel obtained. By anomaly I mean the black dot present in them.
In order to do this I calculated the median of the inverted value channel. I allotted pixel value within 70% above and below the median to be treated as normal pixels. But every pixel value lying beyond this range to be anomalies. The black dots fit perfectly there.
Sample 1:
Sample 2:
Sample 3:
It did not turn out well for few images.
As you can see the black dot is due to the reflection of light which is unique to the droplets of water. Other circular edges might be present in the image but the reflection distinguishes the droplet from those edges.
Now since we have the location of these black dots, we can perform Difference of Gaussians (DoG) (also mentioned in the update of the question) and obtain relevant edge information. If the obtained location of the black dots lie within the edges discovered it is said to be a water droplet.
Disclaimer: This method does not work for all the images. You can add your suggestions to this.
Good day , I am working on this subject and my advice to you is; First, after using many denoising filters such as Gaussian filters, process the image after that.
You can blob-detection these circles not with countors.