Need help regarding object detection in Azure Custom Vision - object-detection

I need to detect a chair, but only when it's in center
So, I captured a video such that the chair covers all parts of the image in every frame
I need to classify between two classes - chair is in center AND chair is not in center
So, I am not getting how to tag each image?
As seen in the below image, should the tag region cover the entire frame?

You might want to think about the formulation of your problem. If you want to classify the entire image frame as to whether there is a chair in the center or not, you might want to cast it as an image classification problem rather than an object detection problem. Essentially you want to do a binary classification of the entire image as to whether there is a chair in the middle or not. So you would have a two class classification problem.
This would be simpler to train, because you would not have to supply bounding boxes, and result in a simpler and more portable model.
To build classification models easily in Watson Studio, you could check out https://cloud.ibm.com/docs/visual-recognition?topic=visual-recognition-tutorial-custom-classifier (programmatically) or https://dzone.com/articles/build-custom-visual-recognition-model-using-watson (with Watson Studio GUI)
If you would like to continue with object detection check out https://medium.com/#vincent.perrin/watson-visual-recognition-object-detection-in-action-in-5-minutes-8f97c4b613c3

Once you know where the chair is using object detection, you can do simple math to tell whether it is in the center or not.

Related

Capture a live video of handwriting using pen and paper and replace the hand in video with some object or cursor

I want to process the captured video. I will try to capture the video of handwriting on paper / drawing on paper. But I do not want to show the hand or pen on the paper while live streaming via p5.js.
Can this be done using machine learning?
Any idea how to implement this?
If I understand you right you want to detect where in the image the hand is a draw an overlay on this position right?
If so You can use YOLO more information to detect where the hand is.
There are some trained networks that you can download maybe they are good enough, maybe you have to train your own just for handy.
There are also some libery for yolo and JS https://github.com/ModelDepot/tfjs-yolo-tiny
You may not need to go the full ML object segmentation route.
If the paper's position and illumination are constant (or at least knowable) you could try some simple heuristic comparing the pixels in the current frame with a short history and using the most constant pixel values. There might be some lag as new parts of your drawing 'become constant' so maybe you could try some modification to the accumulation, such as if the pixel was white and is going black.

How a robust background removal is implemented?

I found that a deep-learning-based method (e.g., 1) is much more robust than a non-deep-learning-based method (e.g., 2, using OpenCV).
https://www.remove.bg
How do I remove the background from this kind of image?
In the OpenCV example, Canny is used to detect the edges. But this step can be very sensitive to the image. The contour detection may end up with wrong contours. It is also difficult to determine which contours should be kept.
How a robust deep-learning method is implemented? Is any good example code? Thanks.
For that to work you need to use Unet. You can search for that on github.
Unet transofrm is: I->I.
Space of the image will become image (of same or similar size).
You need to have say 10.000 images with bg removed. People, (long hair people), cats, cars, shoes, T-shirts, etc.
So you set different backgrounds on all these images as source and prediction should be images with removed background.
You can also do a segmentation model and when you find the foreground you can remove the bg.

Train Model with same image in diferents orientation

It is a good a idea to train the model with the same images , but with diferents orientations? I a have a small set of images for the training thats the reason why Im trying to cover all the mobile camera-gallery user scenarios.
For example, the image: example.png with 3 copies; example90.png, example180.png and example.270.png with their diferents rotations. And also with diferents background colors, shadows, etc.
By the way, my test is to identify the type of animal.
Is that a good idea??
If you use Core ML with the Vision framework (and you probably should), Vision will automatically rotate the image so that "up" is really up. In that case it doesn't matter how the user held their camera when they took the picture (assuming the picture still has the EXIF data that describes its orientation).

TensorFlow: Collecting my own training data set & Using that training dataset to find the location of object

I'm trying to collect my own training data set for the image detection (Recognition, yet). Right now, I have 4 classes and 750 images for each. Each images are just regular images of the each classes; however, some of images are blur or contain outside objects such as, different background or other factors (but nothing distinguishable stuff). Using that training data set, image recognition is really bad.
My question is,
1. Does the training image set needs to contain the object in various background/setting/environment (I believe not...)?
2. Lets just say training worked fairly accurately and I want to know the location of the object on the image. I figure there is no way I can find the location just using the image recognition, so if I use the bounding box, how/where in the code can I see the location of the bounding box?
Thank you in advance!
It is difficult to know in advance what features your programm will learn for each class. But then again, if your unseen images will be in the same background, the background will play no role. I would suggest data augmentation in training; randomly color distortion, random flipping, random cropping.
You can't see in the code where the bounding box is. You have to label/annotate them yourself first in your collected data, using a tool as LabelMe for example. Then comes learning the object detector.

extract an object from image using some image processing filter

I am working on an application which something like that I have an image and e.g. there is a glass or a cup or a chair in it. The object can be of any type
My question here is that is there any way that i can apply some image processing filters or something like that which returns me an image that just contain the object and the background is transparent
You can use object detection methods such as
http://opencv.willowgarage.com/documentation/object_detection.html
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
to detect the object, plot a bounding box around it and extract it from the image.
depends on your application, but you can also use image difference (background subtraction) to get the object...
Actually I have solved the problem
the issue was I do not want to use any advance method that uses template matching or neural networks or anything like that
so in my case the aim was to recognize an object in an image and that object could be anything (e.g. a table,a cellphone, a person, a shirt etc) and the catch was that there could be at most one object in an image
so just using watershed segmentation of opencv I was able to separate the object from the background
but the threshold used for the watershed differs with respect to the frequency of the image and the difference of shades of the object from the background