What algorithm do i need to convert a 2D image file into a representative 2D triangle mesh file? - mesh

I am looking for some advice to point me in the direction of the algorithm I would need to convert an image file into a mesh. Note that I am not asking to convert from 2D into 3D - the output mesh is not required to have any depth.
For image file I mean a black and white image of a relatively simple shape such as a stick figure stored in a simple to read uncompressed bitmap file. The shape would have a high contrast between the black and white areas of the image to help detect the edges of the image by an algorithm.
For the static mesh I mean the data that can be used to construct a typical indexed triangle mesh (list of vertices and a list of indices) in a modern 3D game engine such as Unreal. The mesh would need to represent the shape of the image in 2D but is not required to have any 3D depth in itself, ie. zero thickness. The mesh will ultimately be used in a 3D environment like a cardboard cut-out shape for example imagine it standing on a ground plane.
This conversion is not required to work in any real time environment - it can be batched processed and then it is intended the mesh data read in by the game engine.
Thanks in advance.

Related

How to highlight the desk in RGB with assistence of 3d point cloud?

What I describe next is a preprocess of object detection.
I have a desk with some snacks and pasters,and a Depth Camera(Intel Realsense R300).Now I want to avoid the paster on it(set the pixel value to 0), and avoid the interfering object around the desk(but not on the desk,instead,on the ground).To make full use of Depth Camera,I think that I can using Depth Camera to get the depth image and use RANSAC algorithm to recognize the plane of the desk,which can avoid the paster at the same time.
However, now the problem is, I successfully get the plane of desk with RANSAC algorithm in 3D point cloud, but I don't know how to use it to fit 2D RGB image to set the pixel value of the plane to 0.

Camera's extrinsic matrix

I am trying to use MATLAB's camera calibrator to calibrate an infrared camera. I was able to get the intrinsic matrix by just feeding around 100 images to the calibrator. But I'm struggling with how to get the extrinsic matrix [R|t].
Because the extrinsic matrix is used to map the world frame with the camera frame, so in theory, when the camera(object) is moving, there will be many extrinsic matrices.
In the picture below, if the intrinsic matrix is determined using 50 images, then there are 50 extrinsic matrices correspond to each image. Am I correct?
You are right. Usually, a by-product of an intrinsic calibration is the extrinsic matrix for each pattern observed; this is mostly used to draw the patterns with respect to the camera as in the picture you posted.
What you usually do afterwards is to define some external reference frame that makes sense for you application, also known as the 'world' reference frame, and compute the pose of the camera with respect to it. That's the extrinsic matrix you always hear about.
For this, you:
Define the reference frame and take some points with known 3D coordinates on it; this can be a grid drawn on the floor, for example.
Take a picture of the 3D points with the calibrated camera and get a list of the correspondent 2D (image) coordinates of the points.
Use a pose estimation function that takes: the camera intrinsic parameters, the 3D points and the correspondent 2D image points. I am more familiar with OpenCV, but the Matlab function that seems to do the job is: https://www.mathworks.com/help/vision/ref/estimateworldcamerapose.html

convert RGBD image to a polygon mesh

I have seen this post on how to convert a depth image into a point cloud. What I need is to convert it into a ply file with triangle and vertices (full triangular mesh).
Is this even possible without any special algorithm?

iOS - 2d image turn into a 3d

I was checking out this cool app called Morfo. According to their product description -
Use Morfo to quickly turn a photo of your friend's face into a
talking, dancing, crazy 3D character! Once captured, you can make your
friend say anything you want in a silly voice, rock out, wear makeup,
sport a pair of huge green cat eyes, suddenly gain 300lbs, and more.
So if you take a normal 2D image of steve jobs & feed it to this app it converts it into a 3D model of that image & the user can interact with it.
My questions are as following -
How are they doing this?
How is this possible in iPad?
Isn't it computationally intensive to render and convert 2D image into 3D?
Any pointers, links to websites or libraries in objectiveC which do this is very much appreciated.
UPDATE: this demo of this product here shows how morfo, uses a template mechanism to do the conversion. i.e. after a 2D image is fed, one needs to set the boundaries of the face, where the eyes are located, size & length of lips. then it goes off to convert it into a 3D model. How is this part done? What frameworks or libraries they might be using?
This is a broad question but i can point you in the right direction of how 3D Rendering works, trust me this is a huge subject with decades of work behind it and to much to put here. Not sure how up to speed you are on 3D Rendering techniques so i will give you a basic idea of texturing and point you to a good set of tutorials.
How are they doing this?
The idea is that in 3D Rendering, 3D models can be textured with a 2d image known as a texture map. You use a 2D image and wrap it around a 3d model, be that a simple primitive like a sphere of a cube or more advanced such as the classic teapot or the model of a human head e.t.c. A texture can be taken from anywhere, I have used the camera feed in the past to texture meshes with the video from the camera stream, I have used photos from the camera which s how there doing it. So this is how the face is rendered to the 3D Model.
Is this efficient?
On iOS and most mobile devices 3D rendering uses hardware acceleration utilizing OpenGLES. In regards to your question this is really fast depending on how you implement your render code.
The way it uses the mapping (scale rotate template in the video) as mentioned by anticyclope allows you to make the texture fit a model and also place the eyes which are part of there render code.
So if you want to pick this up i recommend reading Jeff Lamarche Tutorial "from the ground up" as a primer:
http://iphonedevelopment.blogspot.com/2009/05/opengl-es-from-ground-up-table-of.html
Second to that i have read about 4 books on OpenGLES, for general design and for platforms specifics. I recommend this book:
http://www.amazon.co.uk/iPhone-Programming-Developing-Graphical-Applications/dp/0596804822/ref=sr_1_1?ie=UTF8&qid=1331114559&sr=8-1
In my opinion, there is how they doing it. Just my thoughts, haven't saw the application in real-life.
They have a 3D model of human's head. When you click on certain points on 2D image, they are adjusting corresponding points in 3D model, so it is represents a specific face's features like distance between eyes, lips width and so on. Next, texture from 2D image is applied to 3D model using that control points, so we have a textured 3D model of human's head. Given the fact, that our perception is able to reconstruct a 3D shape from 2D images (say, we looking at 2D photo and still imagining a 3D person), there's no need to reconstruct 3D shape accurately, texture will do the work.
There is an issue in the rendering of 3D images, called UV mapping, takes the 3D model and defines a set of edges, and this creates an image that is used to generate different textures to the model.
Now if you notice in Morfo, you define the edge of the head, eyes, mouth and nose. with this information the Morfo knows how to place it texture to the model that has defined.
the process of loading a texture on a model is not very complex and this can be done on any device that has support of some technology such as OpenGL
Isn't it computationally intensive to render and convert 2D image into 3D?
Apple is sinking billions of dollars into developing custom chipsets, and recent models have impressive performance, considering the battery life and low operating temperature (no fans).

World space to screen space (perspective projection)

I'm using a 3d engine and need to translate between 3d world space and 2d screen space using perspective projection, so I can place 2d text labels on items in 3d space.
I've seen a few posts of various answers to this problem but they seem to use components I don't have.
I have a Camera object, and can only set it's current position and lookat position, it cannot roll. The camera is moving along a path and certain target object may appear in it's view then disappear.
I have only the following values
lookat position
position
vertical FOV
Z far
Z near
and obviously the position of the target object.
Can anyone please give me an algorithm that will do this using just these components?
Many thanks.
all graphics engines use matrices to transform between different coordinats systems. Indeed OpenGL and DirectX uses them, because they are the standard way.
Cameras usually construct the matrices using the parameters you have:
view matrix (transform the world to position in a way you look at it from the camera position), it uses lookat position and camera position (also the up vector which usually is 0,1,0)
projection matrix (transforms from 3D coordinates to 2D Coordinates), it uses the fov, near, far and aspect.
You could find information of how to construct the matrices in internet searching for the opengl functions that create them:
gluLookat creates a viewmatrix
gluPerspective: creates the projection matrix
But I cant imagine an engine that doesnt allow you to get these matrices, because I can ensure you they are somewhere, the engine is using it.
Once you have those matrices, you multiply them, to get the viewprojeciton matrix. This matrix transform from World coordinates to Screen Coordinates. So just multiply the matrix with the position you want to know (in vector 4 format, being the 4ยบ component 1.0).
But wait, the result will be in homogeneous coordinates, you need to divide X,Y,Z of the resulting vector by W, and then you have the position in Normalized screen coordinates (0 means the center, 1 means right, -1 means left, etc).
From here it is easy to transform multiplying by width and height.
I have some slides explaining all this here: https://docs.google.com/presentation/d/13crrSCPonJcxAjGaS5HJOat3MpE0lmEtqxeVr4tVLDs/present?slide=id.i0
Good luck :)
P.S: when you work with 3D it is really important to understand the three matrices (model, view and projection), otherwise you will stumble every time.
so I can place 2d text labels on items
in 3d space
Have you looked up "billboard" techniques? Sometimes just knowing the right term to search under is all you need. This refers to polygons (typically rectangles) that always face the camera, regardless of camera position or orientation.