Return coordinates that passes threshold value for bounding boxes Google's Object Detection API - tensorflow

Does anyone know how to get bounding box coordinates which only passes threshold value?
I found this answer (here's a link), so I tried using it and done the following:
vis_util.visualize_boxes_and_labels_on_image_array(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=1,
min_score_thresh=0.80)
for i,b in enumerate(boxes[0]):
ymin = boxes[0][i][0]*height
xmin = boxes[0][i][1]*width
ymax = boxes[0][i][2]*height
xmax = boxes[0][i][3]*width
print ("Top left")
print (xmin,ymin,)
print ("Bottom right")
print (xmax,ymax)
But I noticed that by using answer provided in link - returns all the values. From all the bounding boxes detected by the classifier (which I do not want). What I want is only values from bounding boxes that passes "min_score_thresh".
I feel like this should be very simple, but I do lack knowledge in this area.
If I'll find the answer, I'll be sure to post it right here, but if anyone else knows the answer and could save me some time - I would be grateful.

Update:
The boxes and scores returned by previous functions are both numpy array objects, therefore you can use boolean indexing to filter out boxes below the threshold.
This should give you the box that passes the threshold.
true_boxes = boxes[0][scores[0] > min_score_thresh]
And then you can do
for i in range(true_boxes.shape[0]):
ymin = true_boxes[i,0]*height
xmin = true_boxes[i,1]*width
ymax = true_boxes[i,2]*height
xmax = true_boxes[i,3]*width
print ("Top left")
print (xmin,ymin,)
print ("Bottom right")
print (xmax,ymax)

Related

Calculating the size of an object using opencv and numpy poly1d

I'm looking to use a small numpy array to generate a curve that I can use to predict the height measurement at non-known points. I have several points that I am using to create a poly1d. I know it's possible, we use software that does it just fine at work, and when I used a different image as a tester, plugging the values into Excel and getting the polynomial, it worked fine, but I'm getting pretty drastic measurements on a different calibratable image, I get drastically different results.
Here is the image that I'm trying to measure.
The stick on the front of the pole contains known measurements. From bottom to top, they are 3'6" (42"), 6'6" (78"), 9' 8" (116"), 13' (156)
The picture has been through opencv undistort with a calibrated camera.
This is the function that actually performs the logic. x and y are gathered by cv2 EVENT_LBUTTONUP, and sent to this function.
Checking the lengths of the array is just to help me figure out why this isn't working, trying to generate a line to show the curve fit.
dist = self.firstClick-y
self.yData.append(dist)
if len(self.yData) > 4:
print(self.poly(dist))
if len(self.yData) == 4:
array = np.array(self.xData)
array = np.expand_dims(array, axis=0)
print(self.xData)
print(self.yData)
array=np.append(array, [self.yData], axis=0)
print(array)
x = array[:,0]
y = array[:,1]
self.poly = np.poly1d(np.polyfit(x, y, 2))
poly1d = np.poly1d(self.poly)
xp = np.linspace(-2, 20, 1)
_ = plt.plot(x, y, '.', xp, self.poly(xp), '-', xp, self.poly(xp), '--')
plt.ylim(0,200)
plt.show()
When I run this code, my values tend to quickly go into the tens of thousands when I'm attempting to collect the measurement at 18' 11", (the lowest wire).
Any help would be appreciated, I've been up all night trying to fit this curve.
Edit:
Sorry, I should have included the code used to display and scale the image.
self.img = cv2.imread(imagePath, cv2.IMREAD_ANYCOLOR)
self.scale_percent = 30
self.width = int(self.img.shape[1] * self.scale_percent/100)
self.height = int(self.img.shape[0] * self.scale_percent/100)
dsize = (self.width, self.height)
self.output = cv2.resize(self.img, dsize)
img = self.output
cv2.imshow('image', img)
cv2.setMouseCallback('image', self.click_event)
cv2.waitKey()
I just called this function to display the image and the below code to calibrate the values.
if self.firstClick == 0:
self.firstClick = y
cv2.putText(self.output, "Pole Base", (x, y), font, 1, (255, 255, 0), 2)
cv2.imshow('image', self.output)
elif self.firstClick != 0 and self.secondClick == 0:
self.secondClick = y
print("The difference in first and second clicks is", self.firstClick - self.secondClick)
first = self.firstClick - self.secondClick
inch = first/42
foot = inch*12
self.foot = foot
print("One foot is currently: ", foot)
self.firstLine = 3.5*12
self.secondLine = 6.5*12
self.thirdLine = 9.67*12
self.fourthLine = 13*12
self.xData = np.array([self.firstLine, self.secondLine, self.thirdLine, self.fourthLine])
self.yData.append(self.firstLine)
print(self.firstLine)
print(self.secondLine)
print(self.thirdLine)
print(self.fourthLine)

Is focal length in pixel unit a linear measurment

I have a pan-tilt-zoom camera (changing focal length over time). There is no idea about its base focal length (e.g. focal length in time point 0). However, It is possible to track the change in focal length between frame and another based on some known constraints and assumptions (Doing a SLAM).
If I assume a random focal length (in pixel unit), for example, 1000 pixel. Then, the new focal lengths are tracked frame by frame. Would I get correct results relatively? Would the results (focal lengths) in each frame be correct up to scale to the ground truth focal length?
For pan and tilt, assuming 0 at start would be valid. Although it is not correct, The estimated values of new tili-pan will be correct up to an offset. However, I suspect the estimated focal length will not be even correct up to scale or offset.. Is it correct or not?
For a quick short answer - if pan-tilt-zoom camera is approximated as a thin lens, then this is the relation between distance (z) and focal length (f):
This is just an approximation. Not fully correct. For more precise calculations, see the camera matrix. Focal length is an intrinsic parameter in the camera matrix. Even if not known, it can be calculated using some camera calibration method such as DLT, Zhang's Method and RANSAC. Once you have the camera matrix, focal length is just a small part of it. You get many more useful things along with it.
OpenCV has an inbuilt implementation of Zhang's method. (Look at this documentation for explanations, but code is old and unusable. New up-to-date code below.) You need to take some pictures of a chess board through your camera. Here is some helper code:
import cv2
from matplotlib import pyplot as plt
import numpy as np
from glob import glob
from scipy import linalg
x,y = np.meshgrid(range(6),range(8))
world_points=np.hstack((x.reshape(48,1),y.reshape(48,1),np.zeros((48,1)))).astype(np.float32)
_3d_points=[]
_2d_points=[]
img_paths=glob('./*.JPG') #get paths of all checkerboard images
for path in img_paths:
im=cv2.imread(path)
ret, corners = cv2.findChessboardCorners(im, (6,8))
if ret: #add points only if checkerboard was correctly detected:
_2d_points.append(corners) #append current 2D points
_3d_points.append(world_points) #3D points are always the same
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(_3d_points, _2d_points, (im.shape[1],im.shape[0]), None, None)
print ("Ret:\n",ret)
print ("Mtx:\n",mtx)
print ("Dist:\n",dist)
You might want Undistortion: Correcting for Radial Distortion
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*8,3), np.float32)
objp[:,:2] = np.mgrid[0:6,0:8].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
for fname in img_paths:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (6,8),None)
# If found, add object points, image points (after refining them)
if ret == True:
objpoints.append(objp)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
if 'IMG_5456.JPG' in fname:
plt.figure(figsize=(20,10))
img_vis=img.copy()
cv2.drawChessboardCorners(img_vis, (6,8), corners, ret)
plt.imshow(img_vis)
plt.show()
#Calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
# Reprojection Error
tot_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print ("Mean Reprojection error: ", tot_error/len(objpoints))
# undistort
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
plt.figure(figsize=(20,10))
#cv2.drawChessboardCorners(dst, (6,8), corners, ret)
plt.imshow(dst)
plt.show()
# Reprojection Error
tot_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print ("Mean Reprojection error: ", tot_error/len(objpoints))

Calculating IOU for bounding box predictions

I have these two bounding boxes as given in the image. the box cordinates are given as below :
box 1 = [0.23072851 0.44545859 0.56389928 0.67707491]
box 2 = [0.22677664 0.38237819 0.85152483 0.75449795]
The coordinate are like this : ymin, xmin, ymax, xmax
I am calculating IOU as follows :
def get_iou(box1, box2):
"""
Implement the intersection over union (IoU) between box1 and box2
    
Arguments:
box1 -- first box, numpy array with coordinates (ymin, xmin, ymax, xmax)
    box2 -- second box, numpy array with coordinates (ymin, xmin, ymax, xmax)
"""
# ymin, xmin, ymax, xmax = box
y11, x11, y21, x21 = box1
y12, x12, y22, x22 = box2
yi1 = max(y11, y12)
xi1 = max(x11, x12)
yi2 = min(y21, y22)
xi2 = min(x21, x22)
inter_area = max(((xi2 - xi1) * (yi2 - yi1)), 0)
# Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
box1_area = (x21 - x11) * (y21 - y11)
box2_area = (x22 - x12) * (y22 - y12)
union_area = box1_area + box2_area - inter_area
# compute the IoU
iou = inter_area / union_area
return iou
Based on my understanding these 2 boxes completely overlap each other so IOU should be 1. However I get an IOU of 0.33193138665968164
. Is there something which I am doing wrong or I am interpreting it in an incorrect way. Any suggestions in this regard would be helpful.
You are interpreting the IoU in an incorrect way.
If pay attention to your example, you notice that the union of the areas of the two bounding boxes is much bigger than the intersection of the areas. So it makes sense that the IoU - which is indeed intersection / union - is much smaller than one.
When you say
Based on my understanding these 2 boxes completely overlap each other so IOU should be 1.
that is not true. In your situation the two bounding boxes overlap only in the sense that one is completely contained in the other. But if this situation weren't penalized, IoU could be always maximized predicting a bounding box as big as the image - which clearly doesn't make sense.

After detecting objects in a video stream, I want to crop and save these objects

I had detected objects in a video and i want to crop these objects. I tried tensorflow APIs but none of them worked with me. When trying tf.image.crop_to_bounding_box(
image,
offset_height,
offset_width,
target_height,
target_width
)
it tells me that offset_height didn't defined.
So, i need a guide into how to crop object from an image using tensorflow.
Try to use :
tf.image.crop_and_resize(image,boxes,box_ind,crop_size,method='bilinear',extrapolation_value=0,name=None)
For example:
#bounding box coordinates
ymin =boxes[0][0][0]
xmin =boxes[0][0][1]
ymax =boxes[0][0][2]
xmax =boxes[0][0][3]
test = tf.image.crop_and_resize(image=frame_expanded/255,
boxes=[[ymin,xmin,ymax,xmax]],
box_ind=[0],
crop_size=[100,100])
Worked for me!
or if you want to use tf.image.crop_to_bounding_box(...) try this:
#bounding box coordinates
ymin =boxes[0][0][0]
xmin =boxes[0][0][1]
ymax =boxes[0][0][2]
xmax =boxes[0][0][3]
#image_size
(im_width,im_height) = image.size
(xminn, xmaxx, yminn, ymaxx) = (xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height)
test=tf.image.crop_to_bounding_box(frame,int(yminn), int(xminn), int(ymaxx-yminn), int(xmaxx-xminn))

Mirror an image in JES

I'm trying to mirror an image. That is, if, e.g., a person is facing to the left, when the program terminates I want that person to now be facing instead to the right.
I understand how mirroring works in JES, but I'm unsure how to proceed here.
Below is what I'm trying; be aware that image is a global variable declared in another function.
def flipPic(image):
width = getWidth(image)
height = getHeight(image)
for y in range(0, height):
for x in range(0, width):
left = getPixel(image, x, y)
right = getPixel(image, width-x-1, y)
color = getColor(left)
setColor(right, color)
show(image)
return image
try this
width = getWidth(pic)
height = getHeight(pic)
for y in range (0,height):
for x in range (0, width/2):
left=getPixel(pic, x, y)
right=getPixel(pic, width-x-1,y)
color1=getColor(left)
color2=getColor(right)
setColor(right, color1)
setColor(left, color2)
repaint(pic)
I personally find that repaint is confusing for newbies (like me!).
I'd suggest something like this:
def mirrorImage(image):
width = getWidth(image)
height = getHeight(image)
for y in range (0,height):
for x in range (0, width/2):
left=getPixel(pic, x, y)
right=getPixel(pic, width-x-1,y)
color1=getColor(left)
color2=getColor(right)
setColor(right, color1)
setColor(left, color2)
show(image)
return image
mirrorImage(image)
This seems to work well.. I put some comments in so you can rewrite in your own style.
feel free to ask questions but I think your question may already be answered^^
#this function will take the pixel values for a selected picture and
#past them to a new canvas but fliped over!
def flipPic(pict):
#here we take the height and width of the original picture
width=getWidth(pict)
height=getHeight(pict)
#here we make and empty canvas
newPict=makeEmptyPicture(width,height)
#the Y for loop is setting the range to working for the y axes the started the X for loop
for y in range(0, height):
#the X for loop is setting the range to work in for the x axis
for x in range(0, width):
#here we are collecting the colour information for the origional pix in range of X and
colour=getColor(getPixel(pict,x,y))
#here we are setting the colour information to its new position on the blank canvas
setColor(getPixel(newPict,width-x-1,y),colour)
#setColor(getPixel(newPict,width-x-1,height-y-1),colour)#upsidedown
show(newPict)
#drive function
pict = makePicture(pickAFile())
show(pict)
flipPic(pict)
Might be easier to read if you copy it over to JES first :D
BTW I got full marks for this one in my intro to programming class ;)