How does Kinect calculates depth? - kinect

I'm getting a little bit confused here.
How does Kinect calculates depth: What I understand is that
The IR projector throws out a pattern which is reflected back and read by the IR camera.
Now the IR camera knows the pattern for a particular depth. The difference between the incoming and the known pattern is exploited to calculate the depth known using triangulation (using proportionality of similar triangles).
Question 1: Does it consider the distance between IR projector and IR camera? I guess no because they are too close to be considered.
Question 2: Now we are getting the depth directly from the pattern. When are we using disparity map to calculate depth?

The disparity map is basically the difference between the known and the observed pattern that you mention in the beginning. You use this during the depth computation.
The distance between the projector and the camera gets taken into account too.
Check out the following figure:
Pr is the position of a speckle in a reference depth Zr, and Po is the same speckle captured by the Kinect at a depth Zo (the depth we want to calculate). D is the 3D disparity between the 2 points, while d is the disparity on the 2D image plane. f is the focal length, and b is the is the distance between the camera C and the laser projector L.
As you mentioned, using similar triangles, the depth is calculated as:
The paper where the figure is from is Accuracy Analysis of Kinect Depth Data by K. Khoshelham. I'd suggest reading it for a more thorough explanation of the depth calculation process.

Related

2D shape detection in 3D pointcloud

I'm working on a project to detect the position and orientation of a paper plane.
To collect the data, I'm using an Intel Realsense D435, which gives me accurate, clean depth data to work with.
Now I arrived at the problem of detecting the 2D paper plane silhouette from the 3D point cloud data.
Here is an example of the data (I put the plane on a stick for testing, this will not be in the final implementation):
https://i.stack.imgur.com/EHaEr.gif
Basically, I have:
A 3D point cloud with points on the plane
A 2D shape of the plane
I would like to calculate what rotations/translations are needed to align the 2D shape to the 3D point cloud as accurate as possible.
I've searched online, but couldn't find a good way to do it. One way would be to use Iterative Closest Point (ICP) to first take a calibration pointcloud of the plane in a known orientation, and align it with the current orientation. But from what I've heard, ICP doesn't perform well if the pointclouds aren't kind of already closely aligned at the start.
Any help is appreciated! Coding language doesn't matter.
Does your 3d point cloud have outliers? How many in what way?
How did you use ICP exactly?
One way would be using ICP, with a hand-crafted initial guess using
pcl::transformPointCloud (*cloud_in, *cloud_icp, transformation_matrix);
(to mitigate the problem that ICP needs to be close to work.)
What you actually want is the plane-model that describes the position and orientation of your point-cloud right?
A good estimator of your underlying function can be found with: pcl::ransac
pcl::ransace model consensus
You can then get the computedModel coefficents.
Now finding the correct transformation is just: How to calculate transformation matrix from one plane to another?

Compute road plane normal with an embedded camera

I am developing some computer vision algorithms for vehicle applications.
I am in front of a problem and some help would be appreciated.
Let say we have a calibrated camera attached to a vehicle which captures a frame of the road forward the vehicle:
Initial frame
We apply a first filter to keep only the road markers and return a binary image:
Filtered image
Once the road lane are separated, we can approximate the lanes with linear expressions and detect the vanishing point:
Objective
But what I am looking for to recover is the equation of the normal n into the image without any prior knowledge of the rotation matrix and the translation vector. Nevertheless, I assume L1, L2 and L3 lie on the same plane.
In the 3D space the problem is quite simple. In the 2D image plane, since the camera projective transformation does not keep the angle properties more complex. I am not able to find a way to figure out the equation of the normal.
Do you have any idea about how I could compute the normal?
Thanks,
Pm
No can do, you need a minimum of two independent vanishing points (i.e. vanishing points representing the images of the points at infinity of two different pencils of parallel lines).
If you have them, the answer is trivial: express the image positions of said vanishing points in homogeneous coordinates. Then their cross product is equal (up to scale) to the normal vector of the 3D plane said pencils define, decomposed in camera coordinates.
Your information is insufficient as the others have stated. If your data is coming from a video a common way to get a road ground plane is to take two or more images, compute the associated homography then decompose the homography matrix into the surface normal and relative camera motion. You can do the decomposition with OpenCV's decomposeHomographyMatmethod. You can compute the homography by associating four or more point correspondences using OpenCV's findHomography method. If it is hard to determine these correspondences it is also possible to do it with a combination of point and line correspondences paper, however this is not implemented in OpenCV.
You do not have sufficient information in the example you provide.
If you are wondering "which way is up", one thing you might be able to do is to detect the line on the horizon. If K is the calibration matrix then KTl will give you the plane normal in 3D relative to your camera. (The general equation for backprojection of a line l in the image to a plane E through the center of projection is E=PTl with a 3x4 projection matrix P)
A better alternative might be to establish a homography to rectify the ground-plane. To do so, however, you need at least four non-collinear points with known coordinates - or four lines, no three of which may be parallel.

Are there cameras that capture not only color but the actual depth?

Getting into 3D reconstruction techniques, I'm curious whether there are cameras that capture not only color but the depth at the moment of image being captured. It would appear that getting the depth of a particular pixel on the sensor would be far more accurate than needing to reconstruct after the fact by using many images.
Thoughts? Suggestions?
Yes, there are sensors which are not based on the triangulation principle. They use the time of flight or similar principles to capture the depth for a particular pixel. Take a look at PMD-Sensors
There are Microsoft Kinect, and IntelĀ® RealSenseā„¢ that you can take a look at.

Distance estimation based on signal strength

I have set of data which includes position of a car and unknown emitter signal level. I have to estimate the distance based on this. Basically signal levels varies inversely to the square of distance. But when we include stuff like multipath,reflections etc we need to use a diff equation. Here come the Hata Okumura Model which can give us the path loss based on distance. However , the distance is unknown as I dont know where the emitter is. I only have access to different lat/long sets and the received signal level.
What I am asking is could you guys please guide me to techniques which would help me estimate the distance based on current pos and signal strength.All I am asking for is guidance towards a technique which might be useful.
I have looked into How to calculate distance from Wifi router using Signal Strength? but he has 3 fixed wifi signals and can use the FSPL. However in an urban environment it doesnot work.
Since the car is moving, using any diffraction model would be very difficult. The multipath environment is constantly changing due to moving car, and any reflection/diffraction model requires well-known object geometry around the car. In your problem you have moving car position time series [x(t),y(t)] which is known. You also have a time series of rough measurement of the distance between the car and the emitter [r(t)] of unknown position. You need to solve the stationary unknown emitter position (X,Y). So you have many noisy measurement with two unknown parameters to estimate. This is a classic Least Square Estimation problem. You can formulate r(ti) = sqrt((x(ti)-X)^2 + (y(ti)-Y)^2) and feed your data into this equation and do least square estimation. The data obviously is noisy due to multipath but the emitter is stationary and with overtime and during estimation process, the noise can be more or less smooth out.
Least Square Estimation

How can I compute optical flow from a depth image stream from a depth camera?

I have a depth camera feed already set up and in order to make it more interesting I want to compute some data out of it like normals, motion/optical flow and other data sets to use them for visual effects. I am particularly interested in optical flow and whether it can be computed from a depth only stream.
Has this been implemented? If so I'd like to know what are the methods and understand which one would be the easiest to use.
I worked on Kinect depth camera and implemented a patient tracking algorithm. The algorithm itself is commercial and I cannot disclose the details. But I can give my two cents here.
The depth feed from Kinect should not directly used for optical flow (motion tracking), due to no depth pixels. You can use inpainting to fill in gaps in the depth image. If you are using OpenCV, you can refer to the implementation here.
http://www.morethantechnical.com/2011/03/05/neat-opencv-smoothing-trick-when-kineacking-kinect-hacking-w-code/
I suggest using a smoothing filter after inpainting to have a smooth depth data near object edges. You can use simple filters present in OpenCV with depth stream. It would be nice to downsample 16 bit depth to 8 bit RGB image to help visualize disparity image.
I believe you can then use the resulting stream with optical flow algorithm from OpenCV. Here is an example.
http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_lucas_kanade.html#gsc.tab=0
You can also use Dense trajectory implementation, but I believe it is processor intensive and the final frame rate might be really slow.
https://lear.inrialpes.fr/people/wang/dense_trajectories
Hope this helps.