I am new to mediapipe and face detection and I am trying to extract the landmarks of the lip region of the face. It was quite easy in dlib as the landmarks were kind of continuous, but in media pipe they seem quite random and I cannot get the desired landmarks.
For example in dlib the landmark indices of the left eye are : [37,38,39,40,41,42]
How do I get the same for mediapipe?
I'm not aware of any mapping of MediaPipe landmarks to Dlib landmarks. You will have to run an image and plot the landmarks with their indexes. Because you have a lot of landmarks I recommend you to run a very large image so you will be able to see the landmarks indexes.
Not sure if I understand your question, but Mediapipe use the same face mesh as sceneform or ARCore. Here is the link to the original face mesh.
Beside, here is the close version which you can use to choose your landmark index.
So basically, mediapipe results will be a list of 468 landmarks, you can access to those landmark by its index.
Related
I have a dataset made up of images of faces, with the corresponding landmarks that make up the mouth.
These landmarks are sets of 2D points (x,y pixel position).
Each image-landmark set data pair is tagged as either a smile, or neutral.
What i would like to do is train a deep learning model to return a smile intensity for a new image-landmark data pair.
What should I be searching for to help me with the next step?
Is it a CNN that i need? In my limited understanding, the usual training input is just an image, where I would be passing the landmark sets to train with. Or would an SVM approach be more accurate?
I am looking for maximum accuracy, as much as is possible.
What is the approach that I need called?
I am happy to use PyTorch, Dlib or any framework, I am just a little stuck on the search terms to help me move forward.
Thank you.
It's hard to tell without looking into the dataset and experimenting. But hopefully, the following research materials will guide you in the right direction.
Machine learning-based approach:
https://www.researchgate.net/publication/266672947_Estimating_smile_intensity_A_better_way
Deep learning (CNN): https://arxiv.org/pdf/1602.00172.pdf
A list of awesome papers for smile and smile intensity detection:
https://github.com/EvelynFan/AWESOME-FER/blob/master/README.md
SmileNet project: https://sites.google.com/view/sensingfeeling/
Now, I'm assuming you don't have any label for actual smile intensity.
In such a scenario, the existing smile detection methods can be used directly, you'll use the last activation output (sigmoid) as a confidence score for smiling. If the confidence is higher, the intensity should be higher.
Now, you can use the facial landmark points as separate features (pass them through an LSTM block) and concatenate to the CNN at an early stage/ or later to improve the performance of your model.
If you have the label for smiling intensity, you can just solve it as a regression problem, the CNN will have one output, will try to regress the smile intensity (the normalized smile intensity with sigmoid in this case).
I was wondering as to what would be my best approach to this problem. I have a 6000-by-6000 image with a 6000-by-6000 mask. I wanted to crop the image into several sub-images before training and came across Extract_Patches_2d in Scikit learn. Looks like the tool to get the job done but I have one issue. If I run this on a single image, how can I be sure that it will use the same patch as a blueprint for the image mask as well ?
I'm looking for a way to detect parts of a face (like eyes, nose, lips) with Tensorflow Lite. So far, I haven't seen much info. Is there a way to actually retrieve coordinates that describe this kind of data?
Thanks
I'm about to start developing a neural net here with Tensorflow, but before I get into it too deep, I was hoping I could get some feedback on exactly what type of neural net I will need for this (If a net is the right way to go about this at all)
I need the NN to input an image, and output another image. This will be used for path-mapping on a robot I'm working on. The input image will be a disparity map, and the output will be a "driveable map" (an image that displays what in the scene can be driven on, and what can't)
I have built a dataset using Unity 3d. Here is an example from the set:
disparity map
driveable map:
As you can probably see, white represents the area where my robot can drive and black is where it can't. I will need the NN to take a disparity map, and give me back a "driveable map". Can this be done?
Thanks!
Sorry I'm not an expert. Since there hasn't been a response on this and if you are still looking, the vocabulary I would use to describe this type of problem is disparity networks and segmentation. Your best bet may be a specific type of disparity network: U-net
We have been using Tensorflow for image classification, and we all see the results for the Admiral Grace Hopper, and we get:
military uniform (866): 0.647296
suit (794): 0.0477196
academic gown (896): 0.0232411
bow tie (817): 0.0157356
bolo tie (940): 0.0145024
I was wondering if there is any way to get the coordinates for each category within the image.
Tensorflow doesn't have sample code yet on image detection and localization but it's an open research problem with different approaches to do it using deep nets; for example you can lookup the papers on algorithms called OverFeat and YOLO (You Only Look Once).
Also, usually there's some preprocessing on the object coordinates labels, or postprocessing to suppress duplicate detections. Usually a second, different network is used to classify the object once it's detected.