TensorFlow: Collecting my own training data set & Using that training dataset to find the location of object - tensorflow

I'm trying to collect my own training data set for the image detection (Recognition, yet). Right now, I have 4 classes and 750 images for each. Each images are just regular images of the each classes; however, some of images are blur or contain outside objects such as, different background or other factors (but nothing distinguishable stuff). Using that training data set, image recognition is really bad.
My question is,
1. Does the training image set needs to contain the object in various background/setting/environment (I believe not...)?
2. Lets just say training worked fairly accurately and I want to know the location of the object on the image. I figure there is no way I can find the location just using the image recognition, so if I use the bounding box, how/where in the code can I see the location of the bounding box?
Thank you in advance!

It is difficult to know in advance what features your programm will learn for each class. But then again, if your unseen images will be in the same background, the background will play no role. I would suggest data augmentation in training; randomly color distortion, random flipping, random cropping.
You can't see in the code where the bounding box is. You have to label/annotate them yourself first in your collected data, using a tool as LabelMe for example. Then comes learning the object detector.

Related

Images labeling for object detection when object is larger than the image

how I should label objects to detect them, if the object is larger than the image, e.g. I want to label a building, but in the picture is visible only part of the building (windows and doors, without roof). Or should I remove these pictures from my dataset?
Thank you!
In every object detection dataset I've seen, such objects will just have the label cover whatever is visible, so the bounding box will go up to the border of the image.
It really depends what you want your model to do if it sees an image like this. If you want it to be able to recognise partial buildings, then you should keep them in your dataset and label whatever is visible.
Don't label them. Discard them from your training set. The model needs to learn the difference between the negative class (background) and positive classes (windows, doors). If the positive class takes the whole image, the model will have a massive false positive problem.

Creating an ML-Model that detects card-values

This is a more generic question about training an ML-Model to detect cards.
The cards are a kid's game, 4 different colors, numbers and symbols. I don't need to detect the color, just the value (a.k.a symbol) of the cards.
I tried to take pictures with my iPhone of every card, used RectLabel to draw the rectangles around the symbols in the upper left corner (the cards have an upside down-symbol in the lower right corner, too, I didn't mark these as they'll be hidden during detection).
I cropped the images so only the card is visible, no surroundings.
Then I uploaded my images to app.roboflow.ai and let them do their magic (using Auto-Orient, Resize to 416x416, Grayscale, Auto-Adjust Contrast, Rotation, Shear, Blur and Noise).
That gave me another set of images which I used to train my model with CreateML from Apple.
However, when I use that model in my app (I'm using the Breakfast Finder Demo from Apple), the cards values aren't detected - well, sometimes it works, but only at a certain distance from the phone and the labels are either upside down or sideways.
My guess is this is because my images aren't taken the way they should be?
Any hints on how I'd have to set this whole thing up so my model gets trained well?
My bet would be on this being the problem:
I cropped the images so only the card is visible, no surroundings
You want your training images to be as similar as possible to the images your model will see in the wild. If it's trained only on images of cards with no surroundings and then you show it images of cards with things around them it won't know what to do.
This UNO scoring example is extremely similar to your problem and might provide some ideas and guidance.

Sprite images for object detection

Let's say that I want to detect a character on a background image. The character can be tilted/moved in a number of ways, which will slightly change how we see him. Luckily, I have the sprite sheet for all of his possible positions. Is there a way to train tensorflow to detect objects/characters based on sprite sheets?
You can take different approaches:
1) I would first try out Template Matching. You slide your sprite over the image, and see if it matches the images. You do the same for the tilts, you tilt the sprite image, and slide the tilted sprite over the image. You do this for let's say every tenth of a degree, and take the best matching template.
2) If that's too computationally intensive, I would still use template matching, but only to gather data for a machine learning model. You can do template matching, then record the best match for a frame and the bounding boxes for that best match, and you can then use that to train an object detection network. There's more state-of-the-art stuff than this, but for ease-of-use I would Yolov3. it also has a tiny version, which is less accurate but way faster.

How a robust background removal is implemented?

I found that a deep-learning-based method (e.g., 1) is much more robust than a non-deep-learning-based method (e.g., 2, using OpenCV).
https://www.remove.bg
How do I remove the background from this kind of image?
In the OpenCV example, Canny is used to detect the edges. But this step can be very sensitive to the image. The contour detection may end up with wrong contours. It is also difficult to determine which contours should be kept.
How a robust deep-learning method is implemented? Is any good example code? Thanks.
For that to work you need to use Unet. You can search for that on github.
Unet transofrm is: I->I.
Space of the image will become image (of same or similar size).
You need to have say 10.000 images with bg removed. People, (long hair people), cats, cars, shoes, T-shirts, etc.
So you set different backgrounds on all these images as source and prediction should be images with removed background.
You can also do a segmentation model and when you find the foreground you can remove the bg.

Train Model with same image in diferents orientation

It is a good a idea to train the model with the same images , but with diferents orientations? I a have a small set of images for the training thats the reason why Im trying to cover all the mobile camera-gallery user scenarios.
For example, the image: example.png with 3 copies; example90.png, example180.png and example.270.png with their diferents rotations. And also with diferents background colors, shadows, etc.
By the way, my test is to identify the type of animal.
Is that a good idea??
If you use Core ML with the Vision framework (and you probably should), Vision will automatically rotate the image so that "up" is really up. In that case it doesn't matter how the user held their camera when they took the picture (assuming the picture still has the EXIF data that describes its orientation).