face alignment algorithm on images - objective-c

How can I do a basic face alignment on a 2-dimensional image with the assumption that I have the position/coordinates of the mouth and eyes.
Is there any algorithm that I could implement to correct the face alignment on images?

Face (or image) alignment refers to aligning one image (or face in your case) with respect to another (or a reference image/face). It is also referred to as image registration. You can do that using either appearance (intensity-based registration) or key-point locations (feature-based registration). The second category stems from image motion models where one image is considered a displaced version of the other.
In your case the landmark locations (3 points for eyes and nose?) provide a good reference set for straightforward feature-based registration. Assuming you have the location of a set of points in both of the 2D images, x_1 and x_2 you can estimate a similarity transform (rotation, translation, scaling), i.e. a planar 2D transform S that maps x_1 to x_2. You can additionally add reflection to that, though for faces this will most-likely be unnecessary.
Estimation can be done by forming the normal equations and solving a linear least-squares (LS) problem for the x_1 = Sx_2 system using linear regression. For the 5 unknown parameters (2 rotation, 2 translation, 1 scaling) you will need 3 points (2.5 to be precise) for solving 5 equations. Solution to the above LS can be obtained through Direct Linear Transform (e.g. by applying SVD or a matrix pseudo-inverse). For cases of a sufficiently large number of reference points (i.e. automatically detected) a RANSAC-type method for point filtering and uncertainty removal (though this is not your case here).
After estimating S, apply image warping on the second image to get the transformed grid (pixel) coordinates of the entire image 2. The transform will change pixel locations but not their appearance. Unavoidably some of the transformed regions of image 2 will lie outside the grid of image 1, and you can decide on the values for those null locations (e.g. 0, NaN etc.).
For more details: R. Szeliski, "Image Alignment and Stitching: A Tutorial" (Section 4.3 "Geometric Registration")
In OpenCV see: Geometric Image Transformations, e.g. cv::getRotationMatrix2D cv::getAffineTransform and cv::warpAffine. Note though that you should estimate and apply a similarity transform (special case of an affine) in order to preserve angles and shapes.

For the face there is lot of variability in feature points. So it won't be possible to do a perfect fit of all feature points by just affine transforms. The only way to align all the points perfectly is to warp the image given the points. Basically you can do a triangulation of image given the points and do a affine warp of each triangle to get the warped image where all the points are aligned.

Face detection could be handled based on the just eye positions.
Herein, OpenCV, Dlib and MTCNN offers to detect faces and eyes. Besides, it is a python based framework but deepface wraps those methods and offers an out-of-the box detection and alignment function.
detectFace function applies detection and alignment in the background respectively.
#!pip install deepface
from deepface import DeepFace
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
DeepFace.detectFace("img.jpg", detector_backend = backends[0])
Besides, you can apply detection and alignment manually.
from deepface.commons import functions
img = functions.load_image("img.jpg")
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
detected_face = functions.detect_face(img = img, detector_backend = backends[3])
plt.imshow(detected_face)
aligned_face = functions.align_face(img = img, detector_backend = backends[3])
plt.imshow(aligned_face)
processed_img = functions.detect_face(img = aligned_face, detector_backend = backends[3])
plt.imshow(processed_img)

There's a section Aligning Face Images in OpenCV's Face Recognition guide:
http://docs.opencv.org/trunk/modules/contrib/doc/facerec/facerec_tutorial.html#aligning-face-images
The script aligns given images at the eyes. It's written in Python, but should be easy to translate to other languages. I know of a C# implementation by Sorin Miron:
http://code.google.com/p/stereo-face-recognition/

Related

Camera calibration using Direct Linear Transformation in python

I'm using a Numpy implementation of camera calibration by direct linear transformation (DLT) in python.
I'm trying to use it for 3 dimensional camera calibration.
My problem is, the mean error of the DLT (mean residual of the DLT transformation in units of camera coordinates) is very high in the example, in the thousands of pixels especially compared to the examples provided by the original author (see here).
These are the 3D points I use:
objpoints = [[86.438, -174.922,51.316],[-27.519,-215.460,39.154],
[73.601, 107.800,120.455],[87.602,133.413,34.023],
[101.276,-55.204,108.884],[88.509,-68.038,116.634],
[27.518,-215.460,39.154],[-31.355,-207.334,85.184],
[87.601,-131.059,33.881],[-60.234,-23.833,148.269],[62.162,-23.042,148.715]]
These are the pixels I use:
imgpoints = [[576.0,861.0],[660.0,996.0],[253.0,1383.0],[575.0,1481.0],
[276.0,1217.0],[241.0,1139.0],[665.0,461.0],[231.0, 411.0],
[660.0,226.0],[141.0,684.0],[111.0,1123.0]]
I extracted these points manually, for 3D from a point cloud model (.ply format) and for matching 2D image by pixels.
Something must be wrong with my coordinates at a very basic level, but I'm not sure what it is and how to find it.

CGAL Mean_curvature_flow_skeletonization contract_until_convergence function produces branches that does not exist in input polygon

I use contract_until_convergence function from CGAL Mean_curvature_flow_skeletonization to produce skeleton from input polygon.
https://doc.cgal.org/latest/Surface_mesh_skeletonization/classCGAL_1_1Mean__curvature__flow__skeletonization.html
In some cases the skeleton creates branches (see top of the image above, skeleton in red color) that does not exist in input polygon. Is there some parameters to set to prevent this ?
using Skeletonization = CGAL::Mean_curvature_flow_skeletonization<Polyhedron>;
Skeletonization mean_curve_skeletonizer(polyhedron);
mean_curve_skeletonizer.contract_until_convergence();
There are two parameters controlling the quality of the skeleton:
quality_speed_tradeoff()
medially_centered_speed_tradeoff()
Also one thing that affect the skeleton is the sampling of the input surface that is used to compute Voronoi poles. In the original papers, it is said: Given a sufficiently good sampling, the Voronoi poles [ACK00] form a provably convergent sampling of the medial axis.
[ACK00]AMENTAN., CHOIS., KOLLURIR. K.: The powercrust, unions of balls, and the medial axis transform. Computational Geometry: Theory and Applications 19(2000), 127–153.3
You can use the function isotropic_remeshing with a sufficiently small target edge length to improve the Voronoi pole computation.

try to implement cv2.findContours for person detection

I'm new to opencv and I'm trying to detect person through cv2.findContours with morphological transformation of the video. Here is the code snippet..
import numpy as np
import imutils
import cv2 as cv
import time
cap = cv.VideoCapture(0)
while(cap.isOpened()):
ret, frame = cap.read()
#frame = imutils.resize(frame, width=700,height=100)
gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
gray = cv.GaussianBlur(gray, (21, 21), 0)
cv.accumulateWeighted(gray, avg, 0.5)
mask2 = cv.absdiff(gray, cv.convertScaleAbs(avg))
mask = cv.absdiff(gray, cv.convertScaleAbs(avg))
contours0, hierarchy = cv.findContours(mask2,cv.RETR_EXTERNAL,cv.CHAIN_APPROX_SIMPLE)
for cnt in contours0:
.
.
.
The rest of the code has the logic of a contour passing a line and incrementing the count.
The problem I'm encountering is, cv.findContours detects every movement/change in the frame (including the person). What I want is cv.findContours to detect only person and not any other movement. I know that person detection can be achieved through harrcasacade but is there any way I can implement detection using cv2.findContours?
If not then is there a way I can still do morphological transformation and detect people because the project I'm working on requires filtering of noise and much of the background to detect the person and increment it's count on passing the line.
I will show you two options to do this.
The method I mentioned in the comments which you can use with Yolo to detect humans:
Use saliency to detect the standout parts of the video
Apply K-Means Clustering to cluster the objects into individual clusters.
Apply Background Subtraction and erosion or dilation (or both depends on the video but try them all and see which one does the best job).
Crop the objects
Send the cropped objects to Yolo
If the class name is a pedestrian or human then draw the bounding boxes on them.
Using OpenCV's builtin pedestrian detection which is much more easier:
Convert frames to black and white
Use pedestrian_cascade.detectMultiScale() on the grey frames.
Draw a bounding box over each pedestrian
Second method is much simpler but it depends what is expected of you for this project.

How to calculate the Horizontal and Vertical FOV for the KITTI cameras from the camera intrinsic matrix?

I would like to calculate the Horizontal and Vertical field of view from the camera intrinsic matrix for the cameras used in the KITTI dataset. The reason I need the Field of view is to convert a depth map into 3D point clouds.
Though this question has been asked quite a long time ago, I felt it needed an answer as I ran into the same issue and was unable to find any info on it.
I have however solved it using the information available in this document and some more general camera calibration documents
Firstly, we need to convert the supplied disparity into distance. This can be done through fist converting the disp map into floats through the method in the dev_kit where they state:
disp(u,v) = ((float)I(u,v))/256.0;
This disparity can then be converted into a distance through the default stereo vision equation:
Depth = Baseline * focal length/ Disparity
Now come some tricky parts. I searched high and low for the focal length and was unable to find it in documentation.
I realised just now when writing that the baseline is documented in the aforementioned source however from section IV.B we can see that it can be found in P(i)rect indirectly.
The P_rects can be found in the calibration files and will be used for both calculating the baseline and the translation from uv in the image to xyz in the real world.
The steps are as follows:
For pixel in depthmap:
xyz_normalised = P_rect \ [u,v,1]
where u and v are the x and y coordinates of the pixel respectively
which will give you a xyz_normalised of shape [x,y,z,0] with z = 1
You can then multiply it with the depth that is given at that pixel to result in a xyz coordinate.
For completeness, as P_rect is the depth map here, you need to use P_3 from the cam_cam calibration txt files to get the baseline (as it contains the baseline between the colour cameras) and the P_2 belongs to the left camera which is used as a reference for occ_0 files.

meshlab- how to transfer uvs from source .objs onto poisson reconstruction model

I've been struggling for some time to find a way in Meshlab to include or transfer UV’s onto a poisson model from source meshes. I will try to explain more of what I’m trying to accomplish below.
My source meshes have uv’s along with texture data. I need to build a fused model and include the texture data. It is for facial expression scan data reconstruction for a production pipeline which ultimately builds a facial rig for animation. Our source scan data includes marker information which we use to register, build a fused scan model which is used to generate a retopologized mesh for blendshapes.
Previously, we were using David3D. http://www.david-3d.com/en/support/downloads
David 3D used poisson surface reconstruction to create a fused model. The fused model it created brought along the uvs and optimized the source textures into 1 uv tile. I'll post a picture of the result below that I'm looking to recreate in MeshLab.
My need to find this solution in meshlab is to build tools to help automate this process. David3D version 5 does not have an development kit to program around.
Is it possible in Meshlab to apply the uvs from the regions used from the source mesh onto the poison model? Could I use a filter to transfer them? Reproject them?
Or is there another reconstruction method/ process from within Meshlab that will keep the uv’s?
Here is an image of what the resulting uv parameter looks like from David. The uvs are white on the left half of the image.
Thank You,David3D UV Layout Result
Dan
No, in MeshLab there is no direct way to transfer UV mapping between two layers.
This is because UV transfer is not, in the general case, a trivial task. It is not simply a matter of assigning to the new surface the "closest" UV of the original mesh: this would not work on UV discontinuities, which are present in the example you linked. Additionally, the two meshes should be almost coincident, otherwise you would also have problems also in defining the "closest" UV.
There are a couple ways to do it, but require manual work and a re-sampling of the texture:
create a UV mapping of the re-meshed model using whatever tool you may have, then resample the existing texture on the new parametrization using "transfer: vertex attributes to Texture (1 or 2 meshes)", using texture color as source
load the original mesh, and using the screenshot function, create "virtual" photos of the model (turn off illumination and do NOT use ortho views), adding them as raster layers, until the model surface has been fully covered. Load the new model, that should be in the same space, and texture-map it using the "parametrization + texturing " using those registered images
In MeshLab it is also possible to create a new texture from the original images, if you have a way to import the registered cameras...
TL;DR: UV coords to color channels → Vertex Attribute Transfer → Color channels back to UV coords
I have had very good results kludging it through the color channels, like this (say you are transfering from layer A to layer B):
Make sure A and B are roughly aligned with eachother (you can use the ICP filter if needed).
Select layer A, then:
Texture → Convert Per Wedge UV to Per Vertex UV (if you've got wedge coords)
Color Creation → Per Vertex Color Function, and transfer the tex coords to the color channels (assuming UV range 0-1, you'll want to tweak these if your range is larger):
func r = 255.0 * vtu
func g = 255.0 * vtv
func b = 0
Sampling → Vertex Attribute Transfer, and use this to transfer the vertex colors (which now hold texture coordinates) from layer A to layer B.
source mesh = layer A
target mesh = layer B
check Transfer Color
set distance large enough to not miss any spots
Now select layer B, which contains the mapped vertex colors, and do the opposite that you did for A:
Texture → Per Vertex Texture Function
func u = r / 255.0
func v = g / 255.0
Texture → Convert Per Vertex UV to Per Wedge UV
And that's it.
The results aren't going to be perfect, but in practice I often find them sufficient. In particular:
If the texture is not continuously mapped to layer A (e.g. maybe you've got patches of image mapped to certain areas, etc.), it's very possible for the attribute transfer to B (especially when upsampling) to have some vertices be interpolated across patch boundaries, which will probably lead to visual artifacts along patch boundaries.
UV coords may be quantized by conversion to a color channel and back. (You could maybe eliminate this by stretching U out over all three color channels, then transferring U, then repeating for V -- never tried it though.)
That said, there's a lot of cases it works in.
I may or may not add images / video to this post another day.
PS Meshlab is pretty straightforward to build from source; it might be possible to add a UV coordinate option to the Vertex Attribute Transfer filter. But, to make it more useful, you'd want to make sure that you didn't interpolate across boundary edges in the mapped UV projection. Definitely a project I'd like to work on some day... in theory. If that ever happens I'll post a link here.