I made the program with mediapipe iris and it detects well
when there is an image with face not wearing a mask.
However, if I wear a mask and try to detect iris, it does not catch the either face or iris. Is there any solution for detecting iris when a person wearing a mask??
As you can see the picture, ABSCENT means couldn't detect the face and ATTEND means detected the face.
THOSE ABSCENT image files have faces wearing mask and ATTEND image files have faces not wearing mask
Related
I have a set of icons and a screen recording, the icons are not annotated and have no bounding boxes, they are just png icons with image-level labels eg: "instagram", "facebook", "chrome".
The task is to find the icons within the screen recording and draw a bounding box around them, given the above prerequistes.
My idea of approach so far is:
Use selective search to find ROIs
Use a CNN to classify the regions
Filter out non-icons regions
Draw bounding boxes around positive labelled ROIs
Use resulting screen images with bounding boxes to train a FAST R-CNN
but I am stuck at step 2, I have no idea on how to train the CNN with the image-level labelled icons.
If I make a dataset of all the possible icon-images, with no background or context informations, is it possible to train the CNN to answer the question "Does the ROI includes a possible icon?"
I'm working on an AndroidApp with TensorFlow Lite to process semantic segmentation.
https://www.tensorflow.org/lite/api_docs/java/org/tensorflow/lite/InterpreterApi
Also I'm using U2net model and processing images which are already masked like below.
enter image description here
Problem is masked area could be circle or elliptical shape, so it can't be done with normal process.
Throwing this image into interpreter results in getting just same image in which circle area is white color.
So I have to cut out the circle and throw it, but only the circle area is included.
Throwing cut-out image does't work, because it's out of shape.
The arrays must be square just like normal images.
I counld't find any lead to solve this problem.
Any advice would be appriciated.
Thank you.
I want to process the captured video. I will try to capture the video of handwriting on paper / drawing on paper. But I do not want to show the hand or pen on the paper while live streaming via p5.js.
Can this be done using machine learning?
Any idea how to implement this?
If I understand you right you want to detect where in the image the hand is a draw an overlay on this position right?
If so You can use YOLO more information to detect where the hand is.
There are some trained networks that you can download maybe they are good enough, maybe you have to train your own just for handy.
There are also some libery for yolo and JS https://github.com/ModelDepot/tfjs-yolo-tiny
You may not need to go the full ML object segmentation route.
If the paper's position and illumination are constant (or at least knowable) you could try some simple heuristic comparing the pixels in the current frame with a short history and using the most constant pixel values. There might be some lag as new parts of your drawing 'become constant' so maybe you could try some modification to the accumulation, such as if the pixel was white and is going black.
This is a more generic question about training an ML-Model to detect cards.
The cards are a kid's game, 4 different colors, numbers and symbols. I don't need to detect the color, just the value (a.k.a symbol) of the cards.
I tried to take pictures with my iPhone of every card, used RectLabel to draw the rectangles around the symbols in the upper left corner (the cards have an upside down-symbol in the lower right corner, too, I didn't mark these as they'll be hidden during detection).
I cropped the images so only the card is visible, no surroundings.
Then I uploaded my images to app.roboflow.ai and let them do their magic (using Auto-Orient, Resize to 416x416, Grayscale, Auto-Adjust Contrast, Rotation, Shear, Blur and Noise).
That gave me another set of images which I used to train my model with CreateML from Apple.
However, when I use that model in my app (I'm using the Breakfast Finder Demo from Apple), the cards values aren't detected - well, sometimes it works, but only at a certain distance from the phone and the labels are either upside down or sideways.
My guess is this is because my images aren't taken the way they should be?
Any hints on how I'd have to set this whole thing up so my model gets trained well?
My bet would be on this being the problem:
I cropped the images so only the card is visible, no surroundings
You want your training images to be as similar as possible to the images your model will see in the wild. If it's trained only on images of cards with no surroundings and then you show it images of cards with things around them it won't know what to do.
This UNO scoring example is extremely similar to your problem and might provide some ideas and guidance.
It is a good a idea to train the model with the same images , but with diferents orientations? I a have a small set of images for the training thats the reason why Im trying to cover all the mobile camera-gallery user scenarios.
For example, the image: example.png with 3 copies; example90.png, example180.png and example.270.png with their diferents rotations. And also with diferents background colors, shadows, etc.
By the way, my test is to identify the type of animal.
Is that a good idea??
If you use Core ML with the Vision framework (and you probably should), Vision will automatically rotate the image so that "up" is really up. In that case it doesn't matter how the user held their camera when they took the picture (assuming the picture still has the EXIF data that describes its orientation).