My example is that I have an image with 5 other images on it. Whats the best way to have TensorFlow find/calculate the bounding boxes for each of those... need to take into account that in other images there might only be 3 separate images.
I've found that if I run a cv2.Laplacian on the source image it nicely outlines the 5 individual images but I'm not sure how best to use tensorflow to detect each of those bounding boxes?
UPDATE: My ONE issue is how do I use tensorflow to find each images boundaries? obviously I can find the 4 corners of the whole image but that doesn't help me - I need it to first know how many images their are and then find each of those boundaries.
Related
This is a more generic question about training an ML-Model to detect cards.
The cards are a kid's game, 4 different colors, numbers and symbols. I don't need to detect the color, just the value (a.k.a symbol) of the cards.
I tried to take pictures with my iPhone of every card, used RectLabel to draw the rectangles around the symbols in the upper left corner (the cards have an upside down-symbol in the lower right corner, too, I didn't mark these as they'll be hidden during detection).
I cropped the images so only the card is visible, no surroundings.
Then I uploaded my images to app.roboflow.ai and let them do their magic (using Auto-Orient, Resize to 416x416, Grayscale, Auto-Adjust Contrast, Rotation, Shear, Blur and Noise).
That gave me another set of images which I used to train my model with CreateML from Apple.
However, when I use that model in my app (I'm using the Breakfast Finder Demo from Apple), the cards values aren't detected - well, sometimes it works, but only at a certain distance from the phone and the labels are either upside down or sideways.
My guess is this is because my images aren't taken the way they should be?
Any hints on how I'd have to set this whole thing up so my model gets trained well?
My bet would be on this being the problem:
I cropped the images so only the card is visible, no surroundings
You want your training images to be as similar as possible to the images your model will see in the wild. If it's trained only on images of cards with no surroundings and then you show it images of cards with things around them it won't know what to do.
This UNO scoring example is extremely similar to your problem and might provide some ideas and guidance.
I am working with a object detection problem, here i am using Region based convolution neural network to detect and segment objects in an image.
I have 8 objects to be segmented(Android App Icons), 6 of them posses variations in background and rest 2 will be under white background(static).
I have already taken 200 variations of each object and trained on MaskRCNN, my model is able to depict the patterns very well in 6 objects with variation.But on rest 2 objects it's struggling even on train set even though it's a exact match.
Q. If i have n objects with variations and m objects with no variation(static), do i need to oversample it? Shall i use any other technique here in this case?
Here in image, icons in black bounding boxes are prominent to change(based on background and there position w.r.t background) but Icon in green bounding box will not have any variations(always be in white background)
I have tried adding more images containing static objects but no luck, Can anyone suggest how to go about in any such problem? I don't want to use sliding window(static image processing) approach here.
I found that a deep-learning-based method (e.g., 1) is much more robust than a non-deep-learning-based method (e.g., 2, using OpenCV).
https://www.remove.bg
How do I remove the background from this kind of image?
In the OpenCV example, Canny is used to detect the edges. But this step can be very sensitive to the image. The contour detection may end up with wrong contours. It is also difficult to determine which contours should be kept.
How a robust deep-learning method is implemented? Is any good example code? Thanks.
For that to work you need to use Unet. You can search for that on github.
Unet transofrm is: I->I.
Space of the image will become image (of same or similar size).
You need to have say 10.000 images with bg removed. People, (long hair people), cats, cars, shoes, T-shirts, etc.
So you set different backgrounds on all these images as source and prediction should be images with removed background.
You can also do a segmentation model and when you find the foreground you can remove the bg.
I'm trying to collect my own training data set for the image detection (Recognition, yet). Right now, I have 4 classes and 750 images for each. Each images are just regular images of the each classes; however, some of images are blur or contain outside objects such as, different background or other factors (but nothing distinguishable stuff). Using that training data set, image recognition is really bad.
My question is,
1. Does the training image set needs to contain the object in various background/setting/environment (I believe not...)?
2. Lets just say training worked fairly accurately and I want to know the location of the object on the image. I figure there is no way I can find the location just using the image recognition, so if I use the bounding box, how/where in the code can I see the location of the bounding box?
Thank you in advance!
It is difficult to know in advance what features your programm will learn for each class. But then again, if your unseen images will be in the same background, the background will play no role. I would suggest data augmentation in training; randomly color distortion, random flipping, random cropping.
You can't see in the code where the bounding box is. You have to label/annotate them yourself first in your collected data, using a tool as LabelMe for example. Then comes learning the object detector.
I've been searching around the web about how to do this and I know that it needs to be done with OpenCV. The problem is that all the tutorials and examples that I find are for separated shapes detection or template matching.
What I need is a way to detect the contents between 3 circles (which can be a photo or something else). From what I searched, its not to difficult to find the circles with the camera using contours but, how do I extract what is between them? The circles work like a pattern on the image to grab what is "inside the pattern".
Do I need to use the contours of each circle and measure the distance between them to grab my contents? If so, what if the image is a bit rotated/distorted on the camera?
I'm using Xamarin.iOS for this but from what I already saw, I believe I need to go native for this and any Objective C example is welcome too.
EDIT
Imagining that the image captured by the camera is this:
What I want is to match the 3 circles and get the following part of the image as result:
Since the images come from the camera, they can be rotated or scaled up/down.
The warpAffine function will let you map the desired area of the source image to a destination image, performing cropping, rotation and scaling in a single go.
Talking about rotation and scaling seem to indicate that you want to extract a rectangle of a given aspect ratio, hence perform a similarity transform. To define such a transform, three points are too much, two suffice. The construction of the affine matrix is a little tricky.