Microsoft Kinect SDK depth data to real world coordinates - kinect

I'm using the Microsoft Kinect SDK to get the depth and color information from a Kinect and then convert that information into a point cloud. I need the depth information to be in real world coordinates with the centre of the camera as the origin.
I've seen a number of conversion functions but these are apparently for OpenNI and non-Microsoft drivers. I've read that the depth information coming from the Kinect is already in millimetres, and is contained in the 11bits... or something.
How do I convert this bit information into real world coordinates that I can use?
Thanks in advance!

This is catered for within the Kinect for Windows library using the Microsoft.Research.Kinect.Nui.SkeletonEngine class, and the following method:
public Vector DepthImageToSkeleton (
float depthX,
float depthY,
short depthValue
This method will map the depth image produced by the Kinect into one that is vector scalable, based on real world measurements.
From there (when I've created a mesh in the past), after enumerating the byte array in the bitmap created by the Kinect depth image, you create a new list of Vector points similar to the following:
var width = image.Image.Width;
var height = image.Image.Height;
var greyIndex = 0;
var points = new List<Vector>();
for (var y = 0; y < height; y++)
for (var x = 0; x < width; x++)
short depth;
switch (image.Type)
case ImageType.DepthAndPlayerIndex:
depth = (short)((image.Image.Bits[greyIndex] >> 3) | (image.Image.Bits[greyIndex + 1] << 5));
if (depth <= maximumDepth)
points.Add(nui.SkeletonEngine.DepthImageToSkeleton(((float)x / image.Image.Width), ((float)y / image.Image.Height), (short)(depth << 3)));
case ImageType.Depth: // depth comes back mirrored
depth = (short)((image.Image.Bits[greyIndex] | image.Image.Bits[greyIndex + 1] << 8));
if (depth <= maximumDepth)
points.Add(nui.SkeletonEngine.DepthImageToSkeleton(((float)(width - x - 1) / image.Image.Width), ((float)y / image.Image.Height), (short)(depth << 3)));
greyIndex += 2;
By doing so, the end result from this is a list of vectors stored in millimeters, and if you want centimeters multiply by 100 (etc.).


Shift in Point Cloud acquired using Kinect v2 API

I am acquiring Point Cloud using Kinect v2 API in Windows 10 64 Bit OS. Below is the code snippet-
depthFrame = multiSourceFrame.DepthFrameReference.AcquireFrame();
colorFrame = multiSourceFrame.ColorFrameReference.AcquireFrame();
if (depthFrame == null || colorFrame == null) return;
coordinateMapper.MapDepthFrameToCameraSpace(depthData, cameraSpacePoints);
coordinateMapper.MapDepthFrameToColorSpace(depthData, colorSpacePoints);
colorFrame.CopyConvertedFrameDataToArray(pixels, ColorImageFormat.Rgba);
for (var index = 0; index < depthData.Length; index++)
int u = (int)Math.Floor(colorSpacePoints[index].X);
int v = (int)Math.Floor(colorSpacePoints[index].Y);
if (u < 0 || u >= COLOR_FRAME_WIDTH || v < 0 || v >= COLOR_FRAME_HEIGHT) continue;
int pixelsBaseIndex = v * COLOR_FRAME_WIDTH + u) * COLOR_BYTES_PER_PIXEL;
float x = cameraSpacePoints[index].X;
float y = cameraSpacePoints[index].Y;
float z = cameraSpacePoints[index].Z;
byte red = pixels[pixelsBaseIndex + 0];
byte green = pixels[pixelsBaseIndex + 1];
byte blue = pixels[pixelsBaseIndex + 2];
byte alpha = pixels[pixelsBaseIndex + 3];
PointXYZRGB point = new PointXYZRGB(); // Color point in 3D
point.postion(x, y, z);
point.color(red, green, blue, apha);
Please see below a screenshot of the point cloud-
Please look around the orange-colored ball in above picture. Upon close inspection, it is visible that there exists a shift in the point cloud.
I am wondering, why such shift exists and how to remove/minimize it? Any workaround, please.
The amount of shift in color overlay and depth map can be due to a number of reasons.
Frame acquisition of depth and color frames are not at the same instant (as that is how the _reader_MultiSourceFrameArrived function in kinect SDK works. The timestamps for both cameras are slightly different, hence the slight shift. This is more prominent if you are moving the object in view.
The coordinateMapper function in the sdk for mapping the color frame and depth frame uses the camera calibration parameters. The default camera calibration parameters are had coded in the sdk, however there are slight differences in each and every device. You could try to recalibrate the Kinect cameras and use the updated calibration parameters to get the correct overlay of the color and depth maps. Note however, that by siply rplacing the camera calibration parameters in the Kinect Fusion code and recompiling does not work, as the parameters are replaced from the closed-source Kinect fusion dll.So you'll have to write your own code to update each frame at runtime.
Hope this helps.

converting meter into pixel unit

i am trying to convert convert distance from meter to pixel in ros node, with pcl library and kinect xbox. I was using below code to access euclidean coordinates of every point from kinect inside ros node, which is in meter. But i wanted to get this measurments in pixel unit. What should i do?
cloud_cb (const sensor_msgs::PointCloud2ConstPtr& input)
pcl::PointCloud<pcl::PointXYZRGB> output;
pcl::fromROSMsg(*input,output );
for(int i=0;i<=400;i++)
for(int j=0;j<=400;j++)
p[i][j] =,j);
ROS_INFO("\n p.z = %f \t p.x = %f \t p.y = %f",p[i][j].z,p[i][j].x,p[i][j].y);
sensor_msgs::PointCloud2 cloud;
pub.publish (cloud);
Here P[raw][col] is a Point structure which contains the x,y,z coordinates value in meter, which i want to convert in pixel unit. As i see the value of pixel unit is not constant, so cant use any value found in google.
I got similar question here: Kinect depth conversion from mm to pixels, but it has no solution.
There's a problem with trying to convert meters to pixels. Pixels aren't a standard unit. The physical size of 1 pixel varies on different devices depending on screen resolution and size of a screen.
If you know the resolution of the screen the conversion is still non-trivial.
const int L = 1920; //screen width
const int H = 1280; //screen height
for(int i=0;i<=L;i++){
for(int j=0;j<=H;j++){
p[i][j] =*400/L,j*400/H);
Thus for every pixel you'll have a depth value corresponding to the depth value in the map. This will need some int conversion and improvement.

Bidirectional path tracing

I'm making a bidirectional path tracer and I have some troubles.
To be clear :
1) One point light
2) All objects are diffuse
3) All objects are spheres, even walls (they are very large)
The light emission is a 3D vector. The BRDF of a sphere is a 3D vector. Hard coded.
In the main function below I generate EyePath and LightPath then I connect them. At least I try.
In this post I will talking about the main function then EyePath then LightPath. The talking about connecting function will appear once EyePath and Light are good.
First questions :
Does the generation of the first light point is good ?
Do I need to compute this point according to the emission of the light source? or is it just the emission ? The line is commented where i'm filling the Vertices structure.
Do I need to translate fromlight ? In order to put it on the sphere
The code below is sampled in the main function. Above it there is two for loops going through all pixels. Camera.o is the eye. CameraRayDir is the direction to the current pixel.
//The path light starting point is at the same position as the light
Ray fromLight(Vec(0, 24.3, 0), Vec());
Sphere light = spheres[7];
#define PDF 0.15915494309 // 1 / (2 * PI)
for(int i = 0; i < samps; ++i)
std::vector<Vertices> PathEye;
std::vector<Vertices> PathLight;
Vec cameraRayDir = cx * (double(x) / w - .5) + cy * (double(y) / h - .5) + camera.d;
Ray rayEye(camera.o, cameraRayDir.norm());
// Hemisphere oriented towards the top
fromLight.d = generateRayInHemisphere(fromLight.o,Vec(0,1,0)).d;
double f = clamp(;
Vertices vert;
vert.d = fromLight.d;
vert.x = fromLight.o; = 7;
vert.cos = f;
vert.n = Vec(0,1,0).norm();
// this one ?
//vert.couleur = spheres[7].e * f / PDF;
// Or this one ?
vert.couleur = spheres[7].e;
int sizeEye = generateEyePath(PathEye, rayEye, maxDepth);
int sizeLight = generateLightPath(PathLight, fromLight, maxDepth);
for (int s = 0; s < sizeLight; ++s)
for (int t = 1; t < sizeEye; ++t)
int depth = t + s - 1;
if ((s == 0 && t == 0) || depth < 0 || depth > maxDepth)
pixelValue = pixelValue + connectPaths(PathEye, PathLight, s, t);
For the EyePath I intersect the geometry then I compute the illumination according to the distance with the light. The colour is black if the point is in the shadow.
Second question : For the eye path and the direct illumination, is the computation good ? I've seen in many code, people use the pdf even in direct illumination. But I'm only using point light and spheres.
int generateEyePath(std::vector<Vertices>& v, Ray eye, int maxDepth)
double t;
int id = 0;
Vertices vert;
int RussianRoulette;
while(v.size() <= maxDepth)
if(distribRREye(generatorRREye) < 10)
// Intersect all the geometry
// id is the id of the intersected geometry in an array
intersect(eye, t, id);
const Sphere& obj = spheres[id];
// Intersection point
Vec x = eye.o + eye.d * t;
// normal
Vec n = (x - obj.p).norm();
Vec direction = light.p - x;
// Shadow ray
Ray RaytoLight = Ray(x, direction.norm());
const float distance = direction.length();
// shadow
const bool visibility = intersect(RaytoLight, t, id);
const Sphere &lumiere = spheres[id];
float degree = clamp( - x).norm()));
// If the intersected geometry is not a light, then in shadow
if(lumiere.e.x == 0)
vert.couleur = Vec();
else // else we compute the colour
// obj.c is the brdf, lumiere.e is the emission
vert.couleur = (obj.c).mult(lumiere.e / (distance * distance)) * degree;
vert.x = x; = id;
vert.n = n;
vert.d = eye.d.normn();
vert.cos = degree;
eye = generateRayInHemisphere(x,n);
return v.size();
For the LightPath, for a given point, I compute it according to the previous one and the values at this point. Like in a common path tracing.\n
Third question: Is the colour computation good ?
int generateLightPath(std::vector<Vertices>& v, Ray fromLight, int maxDepth)
double t;
int id = 0;
Vertices vert;
Vec previous;
while(v.size() <= maxDepth)
if(distribRRLight(generatorRRLight) < 10)
previous = v.back().couleur;
intersect(fromLight, t, id);
// intersected geometry
const Sphere& obj = spheres[id];
// Intersection point
Vec x = fromLight.o + fromLight.d * t;
// normal
Vec n = (x - obj.p).norm();
double f = clamp(;
// obj.c is the brdf
vert.couleur = previous.mult(((obj.c / M_PI) * f) / PDF);
vert.x = x; = id;
vert.n = n;
vert.d = fromLight.d.norm();
vert.cos = f;
fromLight = generateRayInHemisphere(x,n);
return v.size();
For the moment I get this result.
enter image description here
The connecting function will come once EyePath and LightPath are good.
Thank you all
Try the spherical reference scene mentioned in this paper. I think then you can work out most of your questions by yourself since it has an analytical solution.
It would save your time to implement and verify your understanding with path tracing and light tracing first, then try to combine them with weights.

In Kinect SDK v2.0 how do I map a pixel in the color image to a voxel in the depth image?

I know how to go the other way around. What I am looking for is, given a (x,y) coordinate in the pixel space (of the 1920x1080 image), how do I get the corresponding (if available) (x,y,z) (in meters) of the depth image. I realize that there are more pixels than voxels and it could be possible not to find any, but Microsoft's SDK has a CoordinateMapper class. This exposes the MapColorFrameToCameraSpace function. If I use that, I can get an array of points in the camera space (x,y,z) but I am unable to figure out how to extract the mapping for a specific pixel.
You probably need to use
Find the color space coordinates of all depth point.
Then, compare your pixel coordinate (x, y) with these color space coordinates. My solution is to find the closest point (might have better way), because the mapped coordinates are floats.
If you use C#, here is the code. Hope it helps!
private ushort GetDepthValueFromPixelPoint(KinectSensor kinectSensor, ushort[] depthData, float PixelX, float PixelY)
ushort depthValue = 0;
if (null != depthData)
ColorSpacePoint[] depP = new ColorSpacePoint[512 * 424];
kinectSensor.CoordinateMapper.MapDepthFrameToColorSpace(_depthData, depP);
int depthIndex = FindClosestIndex(depP, PixelX, PixelY);
if (depthIndex < 0)
depthValue = _depthData[depthIndex];
return depthValue;
private int FindClosestIndex(ColorSpacePoint[] depP, float PixelX, float PixelY)
int depthIndex = -1;
float closestPoint = float.MaxValue;
for (int j = 0; j < depP.Length; ++j)
float dis = DistanceOfTwoPoints(depP[j], PixelX, PixelY);
if (dis < closestPoint)
closestPoint = dis;
depthIndex = j;
return depthIndex;
private float DistanceOfTwoPoints(ColorSpacePoint colorSpacePoint, float PixelX, float PixelY)
float x = colorSpacePoint.X - PixelX;
float y = colorSpacePoint.Y - PixelY;
return (float)Math.Sqrt(x * x + y * y);

Explain code in Kinect SDK

I am working with Kinect and reading example from DepthWithColor-D3D, has some code but i don't understand yet.
// loop over each row and column of the color
for (LONG y = 0; y < m_colorHeight; ++y)
LONG* pDest = (LONG*)((BYTE*)msT.pData + msT.RowPitch * y);
for (LONG x = 0; x < m_colorWidth; ++x)
// calculate index into depth array
int depthIndex = x/m_colorToDepthDivisor + y/m_colorToDepthDivisor * m_depthWidth;
// retrieve the depth to color mapping for the current depth pixel
LONG colorInDepthX = m_colorCoordinates[depthIndex * 2];
LONG colorInDepthY = m_colorCoordinates[depthIndex * 2 + 1];
How to calculate the value of colorInDepthX and colorInDepthY as above code?
colorInDepthX and colorInDepthY is a mapping between the depth and color images so that they will align. Because the Kinect's cameras are slightly offset from each other their field of views are not lined up perfectly.
m_colorCoordinates is defined at the top of the file as such:
m_colorCoordinates = new LONG[m_depthWidth*m_depthHeight*2];
This is a single dimension array representing a 2-dimensional image, it is populated just above the code block you post in your question:
// Get of x, y coordinates for color in depth space
// This will allow us to later compensate for the differences in location, angle, etc between the depth and color cameras
As described in the comment, this is running an calculation provided by the SDK to map the color and depth coordinates onto each other. The result is placed inside of m_colorCoordinates.
colorInDepthX and colorInDepthY are simply values within the m_colorCoordinates array that are being acted upon in the current cycle of the loop. They are not "calculated", per se, but just point to what already exists in m_colorCoordinates.
The function that handles the mapping between color and depth images is explained in the Kinect SDK at MSDN. Here is a direct link: