Computer vision - two frames, overlapping bounding boxes - yolo

Situation:
A bed is there in a room at some x, y position. On the bed, there's an apple at the center of the bed.
Now my code should give an output that there's an apple at the center of the bed, or like an apple on the top left corner of the bed.
Can someone pls help me to know if how could I solve this problem using YOLO/OpenCV/Tensorflow/Torch, etc?

I will answer for Yolo. What are you looking for is nested or overlap detection right? It can be done using Yolo. 1 important thing, when you annotating your dataset, you have to include the nested example as well. If you only label bed and apple as separated object in your dataset, Yolo won't detect the nested object. You also have to include image where there is apple on top of the bed, and draw 2 bounding boxes for each object. See : https://github.com/AlexeyAB/darknet/issues/2519
Example credits to pkhigh on github : https://github.com/AlexeyAB/darknet/issues/2965

Related

Need help regarding object detection in Azure Custom Vision

I need to detect a chair, but only when it's in center
So, I captured a video such that the chair covers all parts of the image in every frame
I need to classify between two classes - chair is in center AND chair is not in center
So, I am not getting how to tag each image?
As seen in the below image, should the tag region cover the entire frame?
You might want to think about the formulation of your problem. If you want to classify the entire image frame as to whether there is a chair in the center or not, you might want to cast it as an image classification problem rather than an object detection problem. Essentially you want to do a binary classification of the entire image as to whether there is a chair in the middle or not. So you would have a two class classification problem.
This would be simpler to train, because you would not have to supply bounding boxes, and result in a simpler and more portable model.
To build classification models easily in Watson Studio, you could check out https://cloud.ibm.com/docs/visual-recognition?topic=visual-recognition-tutorial-custom-classifier (programmatically) or https://dzone.com/articles/build-custom-visual-recognition-model-using-watson (with Watson Studio GUI)
If you would like to continue with object detection check out https://medium.com/#vincent.perrin/watson-visual-recognition-object-detection-in-action-in-5-minutes-8f97c4b613c3
Once you know where the chair is using object detection, you can do simple math to tell whether it is in the center or not.

How to solve object detection problem containing some static objects with no variation?

I am working with a object detection problem, here i am using Region based convolution neural network to detect and segment objects in an image.
I have 8 objects to be segmented(Android App Icons), 6 of them posses variations in background and rest 2 will be under white background(static).
I have already taken 200 variations of each object and trained on MaskRCNN, my model is able to depict the patterns very well in 6 objects with variation.But on rest 2 objects it's struggling even on train set even though it's a exact match.
Q. If i have n objects with variations and m objects with no variation(static), do i need to oversample it? Shall i use any other technique here in this case?
Here in image, icons in black bounding boxes are prominent to change(based on background and there position w.r.t background) but Icon in green bounding box will not have any variations(always be in white background)
I have tried adding more images containing static objects but no luck, Can anyone suggest how to go about in any such problem? I don't want to use sliding window(static image processing) approach here.

How to detect an image between shapes from camera

I've been searching around the web about how to do this and I know that it needs to be done with OpenCV. The problem is that all the tutorials and examples that I find are for separated shapes detection or template matching.
What I need is a way to detect the contents between 3 circles (which can be a photo or something else). From what I searched, its not to difficult to find the circles with the camera using contours but, how do I extract what is between them? The circles work like a pattern on the image to grab what is "inside the pattern".
Do I need to use the contours of each circle and measure the distance between them to grab my contents? If so, what if the image is a bit rotated/distorted on the camera?
I'm using Xamarin.iOS for this but from what I already saw, I believe I need to go native for this and any Objective C example is welcome too.
EDIT
Imagining that the image captured by the camera is this:
What I want is to match the 3 circles and get the following part of the image as result:
Since the images come from the camera, they can be rotated or scaled up/down.
The warpAffine function will let you map the desired area of the source image to a destination image, performing cropping, rotation and scaling in a single go.
Talking about rotation and scaling seem to indicate that you want to extract a rectangle of a given aspect ratio, hence perform a similarity transform. To define such a transform, three points are too much, two suffice. The construction of the affine matrix is a little tricky.

Intersect detection on a transparent image

Here's my issue. I am making a small car game in xcode. I am using around separate 20 .png images as my road. each image is around 600 x 1200 but the road itself is in the centre of the image and it's narrow and winding. I have no idea how I can get the car to detect where it has crossed the edge of the road. I really hope this is making sense to someone.
I don't have any code to share as I am completely stuck on how to do it, and I have been searching everywhere before I came here to ask.
I really appreciate some help.
Thanks
[EDIT** I have attached an image trying to illustrate what I am trying to achieve. As you can see all the white area is transparent as basically if my car crosses the the bounds of the road you die. Hopefully that makes a little more sense
Instead of using an image for your road, try using a single UIBezierPath, which you can construct from lines and arcs. You can then use containsPoint: to determine if any of the corners of your "car", return NO for that method.

Calculating the area and position of dynamically formed polygons

Hi stackoverflow community,
This is a continuation of a question I asked 6 months regarding calculating the area and position of dynamically formed rectangles. The solution provided for that worked a treat but now I want to take this a step further.
Some background - I'm working on a puzzle game using Cocos2D/Box2D were the player draws lines on the screen. Depending on were the player draws, I want to then work out the area and position of polygons that appear as a result of the drawn lines.
In the following image, the black border represents a playing area, this will always be the same shape. The grey lines are player drawn and will always be straight. The green square is an obstacle. The obstacle objects will be convex shapes. The formed polygons (3 in this case) are the blue areas and are the shapes I'm trying to get the coordinates and area for.
I think I'll be fine with working out the area of a polygon using determinants but before that, I need to work out the coordinates of the blue polygons and I'm not sure how to do this.
I've got the lines (x,y) coordinates for both ends, the coordinates for the obstacle and the corner coordinates for the black border. Using those, is it possible to work out the coordinates of the blue polygons or am I approaching this the wrong way?
UPDATE - response to duffymo
Thanks for your answer. To explain further, each object mentioned is defined and encapsulated in a class i.e. I've got a Line/Obstacle/PlayingArea object. My polygon object is encapsulated in a 'Rectangle' object. Each one of these objects has it's own properties associated with it such as its coordinates/area/ID/state etc...
In order to keep track of all the objects, I've got an over-seeing singleton object which holds all of the Line objects / Obstacle objects etc in their own respective array. This way, I can loop through say all Lines and know were each one has been drawn by the player.
The game is a bit like classic JezzBall so I need to be able to create these polygon shapes when a user draws a line because the polygon shape will be used as my way of detecting if that particular area contains a ball. If not the area needs to be filled.
Since you already have the nodes and edges for your polygons, I'd recommend that you calculate the centroids, perimeters, and areas using contour integration You can express the centroids and areas as contour integrals using Green's theorem.
You can use Gaussian quadrature to do piecewise integration along each edge.
It'll be fast and accurate; it'll work on polygons of arbitrary complexity.
UPDATE: Objective-C is an object-oriented language. I don't know it myself, but I believe it's based on ideas from C and C++. Since that's the case, I'd recommend that you start writing more in terms of objects. Arrays of coordinates? They need to encapsulated together. I'd suggest a Point abstraction that encapsulates a point (id, x, y) together. Make a Grid that has a List of Points.
It sounds like users supply the relationship between Points to form Polygons. That's not clear from your description, so it's not a surprise that you're having trouble implementing it.