2D image to 3D world coordinate in perspective view - object-detection

I have been trying to locate detected objects from 2D image in the 3D space for a single fixed camera installed at the height.
I went trough the similar questions, but the perspective view is not mentioned.
What I have:
The height of the camera
Calibration parameters
The exact location of one fixed object in view

I've wrote a set of solutions to this kind of problem. 3d points reconstruction from 2d coordinates (yes, "in perspective"), are obtained by means of the extrinsics matrix. See https://github.com/rodolfoap/screen2world-k. Other methods are linked from there.

Related

QSGGeometryNode depth (z) problems with 3 vertices

I am drawing a 3D geometry (Point3D vertices) in a Qml scene graph with a custom QSGGeometryNode and QSGTransformNode. This works except that the 3D model is cut off at a certain z-coordinate (z is the depth axis in Qml). First I expected that the problem is due to intersection with the Qml 2D plane. But I tried to move the model along the z axis and it gets always cut off (as if there is a local model frustum clipping plane).
What could be the source of this problem?
Regards,
Unfortunately you can't "just" render 3D content inside the scene, as the scene graph will compress your Z values to make them honour proper stacking of the items.
If you have a 3D object, you may want to use QQuickFramebufferObject instead (see also this blog post).

iOS - 2d image turn into a 3d

I was checking out this cool app called Morfo. According to their product description -
Use Morfo to quickly turn a photo of your friend's face into a
talking, dancing, crazy 3D character! Once captured, you can make your
friend say anything you want in a silly voice, rock out, wear makeup,
sport a pair of huge green cat eyes, suddenly gain 300lbs, and more.
So if you take a normal 2D image of steve jobs & feed it to this app it converts it into a 3D model of that image & the user can interact with it.
My questions are as following -
How are they doing this?
How is this possible in iPad?
Isn't it computationally intensive to render and convert 2D image into 3D?
Any pointers, links to websites or libraries in objectiveC which do this is very much appreciated.
UPDATE: this demo of this product here shows how morfo, uses a template mechanism to do the conversion. i.e. after a 2D image is fed, one needs to set the boundaries of the face, where the eyes are located, size & length of lips. then it goes off to convert it into a 3D model. How is this part done? What frameworks or libraries they might be using?
This is a broad question but i can point you in the right direction of how 3D Rendering works, trust me this is a huge subject with decades of work behind it and to much to put here. Not sure how up to speed you are on 3D Rendering techniques so i will give you a basic idea of texturing and point you to a good set of tutorials.
How are they doing this?
The idea is that in 3D Rendering, 3D models can be textured with a 2d image known as a texture map. You use a 2D image and wrap it around a 3d model, be that a simple primitive like a sphere of a cube or more advanced such as the classic teapot or the model of a human head e.t.c. A texture can be taken from anywhere, I have used the camera feed in the past to texture meshes with the video from the camera stream, I have used photos from the camera which s how there doing it. So this is how the face is rendered to the 3D Model.
Is this efficient?
On iOS and most mobile devices 3D rendering uses hardware acceleration utilizing OpenGLES. In regards to your question this is really fast depending on how you implement your render code.
The way it uses the mapping (scale rotate template in the video) as mentioned by anticyclope allows you to make the texture fit a model and also place the eyes which are part of there render code.
So if you want to pick this up i recommend reading Jeff Lamarche Tutorial "from the ground up" as a primer:
http://iphonedevelopment.blogspot.com/2009/05/opengl-es-from-ground-up-table-of.html
Second to that i have read about 4 books on OpenGLES, for general design and for platforms specifics. I recommend this book:
http://www.amazon.co.uk/iPhone-Programming-Developing-Graphical-Applications/dp/0596804822/ref=sr_1_1?ie=UTF8&qid=1331114559&sr=8-1
In my opinion, there is how they doing it. Just my thoughts, haven't saw the application in real-life.
They have a 3D model of human's head. When you click on certain points on 2D image, they are adjusting corresponding points in 3D model, so it is represents a specific face's features like distance between eyes, lips width and so on. Next, texture from 2D image is applied to 3D model using that control points, so we have a textured 3D model of human's head. Given the fact, that our perception is able to reconstruct a 3D shape from 2D images (say, we looking at 2D photo and still imagining a 3D person), there's no need to reconstruct 3D shape accurately, texture will do the work.
There is an issue in the rendering of 3D images, called UV mapping, takes the 3D model and defines a set of edges, and this creates an image that is used to generate different textures to the model.
Now if you notice in Morfo, you define the edge of the head, eyes, mouth and nose. with this information the Morfo knows how to place it texture to the model that has defined.
the process of loading a texture on a model is not very complex and this can be done on any device that has support of some technology such as OpenGL
Isn't it computationally intensive to render and convert 2D image into 3D?
Apple is sinking billions of dollars into developing custom chipsets, and recent models have impressive performance, considering the battery life and low operating temperature (no fans).

Simple algorithm for tracking a rectangular blob

I have created an experimental fast rectangular object tracking system; it will be used for headtracking and controllling objects in 3D engine (Ogre3D).
For now I am able to show to the webcam any kind of bright colored rectangle (text markers are good objects) and system registers basic properties of this object (hue/value/lightness and initial width and height in 0 degrees rotation).
After I have registered the trackable object, I do some simple frame processing to create grayscale probabilty map.
So now I have 2 known things:
1) 4 corners for the last object position (it's always a rectangle but it may be rotated)
2) a pretty rectangular (but still far from perfect) blob which is the brightest in the frame. I can get coordinates of any point of the blob without problems, point detection is stable enough.
I can find a bounding rectangle of the object without problems, but I have a problem with detecting the object corners themselves.
I need the simplest possible (quick&dirty would be great) algorithm to scan the image starting with some known coordinates (a point inside the blob) and detect new 4 x,y coordinates of a "blobish" rectangle corners (not corners of a bounding box but corners of the rectangular blob itself).
Ready-to-use C++ function would be awesome, but somehow google doesn't like me today :(
I think that it would be overkill to use some complicated function form OpenCV library just to extract 4 points of a single rectanglular blob. But if you know a quick and efficient way how to do it using OpenCV (it must be real-time and light on CPU because I'll run the 3D engine at the same time) then I would be really grateful.
You can apply Hough transform on segmented image to detect lines. Using detected lines you can calculate their intersection to find the corner coordinates of the blob.

World space to screen space (perspective projection)

I'm using a 3d engine and need to translate between 3d world space and 2d screen space using perspective projection, so I can place 2d text labels on items in 3d space.
I've seen a few posts of various answers to this problem but they seem to use components I don't have.
I have a Camera object, and can only set it's current position and lookat position, it cannot roll. The camera is moving along a path and certain target object may appear in it's view then disappear.
I have only the following values
lookat position
position
vertical FOV
Z far
Z near
and obviously the position of the target object.
Can anyone please give me an algorithm that will do this using just these components?
Many thanks.
all graphics engines use matrices to transform between different coordinats systems. Indeed OpenGL and DirectX uses them, because they are the standard way.
Cameras usually construct the matrices using the parameters you have:
view matrix (transform the world to position in a way you look at it from the camera position), it uses lookat position and camera position (also the up vector which usually is 0,1,0)
projection matrix (transforms from 3D coordinates to 2D Coordinates), it uses the fov, near, far and aspect.
You could find information of how to construct the matrices in internet searching for the opengl functions that create them:
gluLookat creates a viewmatrix
gluPerspective: creates the projection matrix
But I cant imagine an engine that doesnt allow you to get these matrices, because I can ensure you they are somewhere, the engine is using it.
Once you have those matrices, you multiply them, to get the viewprojeciton matrix. This matrix transform from World coordinates to Screen Coordinates. So just multiply the matrix with the position you want to know (in vector 4 format, being the 4ยบ component 1.0).
But wait, the result will be in homogeneous coordinates, you need to divide X,Y,Z of the resulting vector by W, and then you have the position in Normalized screen coordinates (0 means the center, 1 means right, -1 means left, etc).
From here it is easy to transform multiplying by width and height.
I have some slides explaining all this here: https://docs.google.com/presentation/d/13crrSCPonJcxAjGaS5HJOat3MpE0lmEtqxeVr4tVLDs/present?slide=id.i0
Good luck :)
P.S: when you work with 3D it is really important to understand the three matrices (model, view and projection), otherwise you will stumble every time.
so I can place 2d text labels on items
in 3d space
Have you looked up "billboard" techniques? Sometimes just knowing the right term to search under is all you need. This refers to polygons (typically rectangles) that always face the camera, regardless of camera position or orientation.

Perspective Rotation about Y axis

I have a 2D image and I want to create a anaglyph image for this single 2D image. To do this I need to create Left and Right views. I will considar my 2D image as Left view and I want to create Right View now.
I came to know that the perspective rotation (about Y axis) and perspective skews will give the right image.
I know that the perspective projection is related to 3D.
Basically I am new to 3D programming.
Can you plz explain how to do perspective rotation abuout Y-axis. And how can I apply this to my 2D image.I am using C++.
Thank you verymuch
N.A.Reddy.
You can't create an anaglyph from a 2D image. You need either two 2D images that were taken slightly apart from each other or you need a 3D image. You can try and generate 3D information from a 2D image but that's almost impossible and an active area of research.