After detecting objects in a video stream, I want to crop and save these objects - tensorflow

I had detected objects in a video and i want to crop these objects. I tried tensorflow APIs but none of them worked with me. When trying tf.image.crop_to_bounding_box(
image,
offset_height,
offset_width,
target_height,
target_width
)
it tells me that offset_height didn't defined.
So, i need a guide into how to crop object from an image using tensorflow.

Try to use :
tf.image.crop_and_resize(image,boxes,box_ind,crop_size,method='bilinear',extrapolation_value=0,name=None)
For example:
#bounding box coordinates
ymin =boxes[0][0][0]
xmin =boxes[0][0][1]
ymax =boxes[0][0][2]
xmax =boxes[0][0][3]
test = tf.image.crop_and_resize(image=frame_expanded/255,
boxes=[[ymin,xmin,ymax,xmax]],
box_ind=[0],
crop_size=[100,100])
Worked for me!
or if you want to use tf.image.crop_to_bounding_box(...) try this:
#bounding box coordinates
ymin =boxes[0][0][0]
xmin =boxes[0][0][1]
ymax =boxes[0][0][2]
xmax =boxes[0][0][3]
#image_size
(im_width,im_height) = image.size
(xminn, xmaxx, yminn, ymaxx) = (xmin * im_width, xmax * im_width, ymin * im_height, ymax * im_height)
test=tf.image.crop_to_bounding_box(frame,int(yminn), int(xminn), int(ymaxx-yminn), int(xmaxx-xminn))

Related

convert a .csv file to yolo darknet format

I have a few annotations that is originally in .csv format. I would need to convert it to yolo darknet format inorder to train my model with yolov4.
my .csv file :
YOLO format is : object-class x y width height
where, object_class, widht, height is know from my .csv format. But finding x,y is confusing .Note that x and y are center of rectangle (are not top-left corner).
Any help would be appreciated :)
You can use this function to convert bounding boxes to the yolo format. Of course you will need to write some code to read the csv. Just use this function as a template for your needs.
This function was extracted from the labelimg app:
https://github.com/tzutalin/labelImg/blob/master/libs/yolo_io.py
def BndBox2YoloLine(self, box, classList=[]):
xmin = box['xmin']
xmax = box['xmax']
ymin = box['ymin']
ymax = box['ymax']
xcen = float((xmin + xmax)) / 2 / self.imgSize[1]
ycen = float((ymin + ymax)) / 2 / self.imgSize[0]
w = float((xmax - xmin)) / self.imgSize[1]
h = float((ymax - ymin)) / self.imgSize[0]
# PR387
boxName = box['name']
if boxName not in classList:
classList.append(boxName)
classIndex = classList.index(boxName)
return classIndex, xcen, ycen, w, h

Calculating IOU for bounding box predictions

I have these two bounding boxes as given in the image. the box cordinates are given as below :
box 1 = [0.23072851 0.44545859 0.56389928 0.67707491]
box 2 = [0.22677664 0.38237819 0.85152483 0.75449795]
The coordinate are like this : ymin, xmin, ymax, xmax
I am calculating IOU as follows :
def get_iou(box1, box2):
"""
Implement the intersection over union (IoU) between box1 and box2
    
Arguments:
box1 -- first box, numpy array with coordinates (ymin, xmin, ymax, xmax)
    box2 -- second box, numpy array with coordinates (ymin, xmin, ymax, xmax)
"""
# ymin, xmin, ymax, xmax = box
y11, x11, y21, x21 = box1
y12, x12, y22, x22 = box2
yi1 = max(y11, y12)
xi1 = max(x11, x12)
yi2 = min(y21, y22)
xi2 = min(x21, x22)
inter_area = max(((xi2 - xi1) * (yi2 - yi1)), 0)
# Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
box1_area = (x21 - x11) * (y21 - y11)
box2_area = (x22 - x12) * (y22 - y12)
union_area = box1_area + box2_area - inter_area
# compute the IoU
iou = inter_area / union_area
return iou
Based on my understanding these 2 boxes completely overlap each other so IOU should be 1. However I get an IOU of 0.33193138665968164
. Is there something which I am doing wrong or I am interpreting it in an incorrect way. Any suggestions in this regard would be helpful.
You are interpreting the IoU in an incorrect way.
If pay attention to your example, you notice that the union of the areas of the two bounding boxes is much bigger than the intersection of the areas. So it makes sense that the IoU - which is indeed intersection / union - is much smaller than one.
When you say
Based on my understanding these 2 boxes completely overlap each other so IOU should be 1.
that is not true. In your situation the two bounding boxes overlap only in the sense that one is completely contained in the other. But if this situation weren't penalized, IoU could be always maximized predicting a bounding box as big as the image - which clearly doesn't make sense.

Return coordinates that passes threshold value for bounding boxes Google's Object Detection API

Does anyone know how to get bounding box coordinates which only passes threshold value?
I found this answer (here's a link), so I tried using it and done the following:
vis_util.visualize_boxes_and_labels_on_image_array(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=1,
min_score_thresh=0.80)
for i,b in enumerate(boxes[0]):
ymin = boxes[0][i][0]*height
xmin = boxes[0][i][1]*width
ymax = boxes[0][i][2]*height
xmax = boxes[0][i][3]*width
print ("Top left")
print (xmin,ymin,)
print ("Bottom right")
print (xmax,ymax)
But I noticed that by using answer provided in link - returns all the values. From all the bounding boxes detected by the classifier (which I do not want). What I want is only values from bounding boxes that passes "min_score_thresh".
I feel like this should be very simple, but I do lack knowledge in this area.
If I'll find the answer, I'll be sure to post it right here, but if anyone else knows the answer and could save me some time - I would be grateful.
Update:
The boxes and scores returned by previous functions are both numpy array objects, therefore you can use boolean indexing to filter out boxes below the threshold.
This should give you the box that passes the threshold.
true_boxes = boxes[0][scores[0] > min_score_thresh]
And then you can do
for i in range(true_boxes.shape[0]):
ymin = true_boxes[i,0]*height
xmin = true_boxes[i,1]*width
ymax = true_boxes[i,2]*height
xmax = true_boxes[i,3]*width
print ("Top left")
print (xmin,ymin,)
print ("Bottom right")
print (xmax,ymax)

KITTI dataset crop labelled point cloud

I am trying to train my model to recognize car, pedestrian and cyclist, it requires the cyclist, car and pedestrian point cloud as the training data. I downloaded the dataset from KITTI (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d), both the label and the velodyne point(http://www.cvlibs.net/download.php?file=data_object_label_2.zip)(http://www.cvlibs.net/download.php?file=data_object_velodyne.zip). However, the object label doesn't seem to be from this set of data. I attempted to crop the point cloud to extract the object point cloud, but I can only obtain blank 3d space. This is my cropping function in MATLAB. Is there any mistake in my code? Is there any training and testing data set for pedestrian, cyclist and car point cloud available elsewhere?
function pc = crop_pc3d (pt_cloud, x, y, z, height, width, length)
%keep points only between (x,y,z) and (x+length, y+width,z+height)
%Initialization
y_min = y; y_max = y + width;
x_min = x; x_max = x + length;
z_min = z; z_max = z + height;
%Get ROI
x_ind = find( pt_cloud.Location(:,1) < x_max & pt_cloud.Location(:,1) > x_min );
y_ind = find( pt_cloud.Location(:,2) < y_max & pt_cloud.Location(:,2) > y_min );
z_ind = find( pt_cloud.Location(:,3) < z_max & pt_cloud.Location(:,3) > z_min );
crop_ind_xy = intersect(x_ind, y_ind);
crop_ind = intersect(crop_ind_xy, z_ind);
%Update point cloud
pt_cloud = pt_cloud.Location(crop_ind, :);
pc = pointCloud(pt_cloud);
end
The labels are in the image coordinate plane. So, in order to use them for the point cloud, they need to be transformed into velodyne coordinate plane.
For this transformation, use calibration data provided by camera calibration matrices.
Calibration data is provide on the KITTI.
http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

tensorflow object detection API: training is very slow

I am currently studying google tensorflow object detection API. When I try to retrain the model with Oxford III pet dataset, the training process is very slow.
Here is what I found so far:
most of time only 2% GPU is utilzed.
but CPU utilization is 60%, so It seems GPU is not starved by input, otherwise CPU should be near 100% utilization.
I am trying to profile it with tensorflow profiler, but I am in a bit hurry now, any idea or suggestion would be helpful.
I found the problems. It's the issue with input, my tfrecord file is corrupted somehow, so the input thread hang up sometimes.
There are many reasons for this to happen. The most common being that there is some problem with your record file. There need to be done some testing before adding an image and it's contour to record file. Some of them are:
First check the image before sending it to the record:
def checkJPG(fn):
with tf.Graph().as_default():
try:
image_contents = tf.read_file(fn)
image = tf.image.decode_jpeg(image_contents, channels=3)
init_op = tf.initialize_all_tables()
with tf.Session() as sess:
sess.run(init_op)
tmp = sess.run(image)
except:
print("Corrupted file: ", fn)
return False
return True
Also, check the height and width of the contour and if any contour is not crossing the borders:
boxW = xmax - xmin
boxH = ymax - ymin
if boxW == 0 or boxH == 0:
print("...ONE CONTOUR SKIPPED... (boxW | boxH) = 0")
continue
if boxW*boxH < 100:
print("...ONE CONTOUR SKIPPED... (boxW*boxH) < 100")
continue
if xmin / width <= 0 or xmax / width <= 0 or ymin / height <= 0 or ymax / height <= 0:
print("...ONE CONTOUR SKIPPED... (x | y) <= 0")
continue
if xmin / width >= 1 or xmax / width >= 1 or ymin / height >= 1 or ymax / height >= 1:
print("...ONE CONTOUR SKIPPED... (x | y) >= 1")
continue
One of the other reason is that there is too much data in evaluation record file. It's better to add only 10 images in your evaluation record file and change the evaluation config like this:
eval_config {
num_visualizations: 10
num_examples: 10
eval_interval_secs: 3000
max_evals: 1
use_moving_averages: false
}
As i can see , it is not utilizing GPU as now,
Have you tried to optimise GPU using tensorflow given parameter
https://www.tensorflow.org/performance/performance_guide#optimizing_for_gpu