How to get raw depth values from Kinect depth stream? - tracking

I am tracking the motion of a human body using Lucas Kanade algorithm. I am working on RGB and Depth streams of Kinect. I have to place some points on the body, and track those points. Finally their xyz co-ordinates have to be calculated.
My question is, how to get z coordinate for a particular pixel from the depth stream? I know Depth raw depth values for kinect lie in 0-2047 range..and applying formulas on it can get me the distance in any units I want. But how to get this raw depth? Can anyone help?
I am using libfreenect driver for kinect and opencv.

With OpenCV 2.3.x, you can retrieve all the information that you want. You will need to compile it from source with OpenNI to work. Look at this code:
VideoCapture capture(CV_CAP_OPENNI);
for(;;)
{
Mat depthMap;
Mat rgbImage
capture.grab();
capture.retrieve( depthMap, OPENNI_DEPTH_MAP );
if( waitKey( 30 ) >= 0 )
break;
}
Yes, its that easy. depthMap will store a depth in each index. This depth represents the distance from the sensor to the surface were the depth has been measured with the IR. More details at: http://opencv.itseez.com/doc/user_guide/ug_highgui.html

Related

How to calculate the Horizontal and Vertical FOV for the KITTI cameras from the camera intrinsic matrix?

I would like to calculate the Horizontal and Vertical field of view from the camera intrinsic matrix for the cameras used in the KITTI dataset. The reason I need the Field of view is to convert a depth map into 3D point clouds.
Though this question has been asked quite a long time ago, I felt it needed an answer as I ran into the same issue and was unable to find any info on it.
I have however solved it using the information available in this document and some more general camera calibration documents
Firstly, we need to convert the supplied disparity into distance. This can be done through fist converting the disp map into floats through the method in the dev_kit where they state:
disp(u,v) = ((float)I(u,v))/256.0;
This disparity can then be converted into a distance through the default stereo vision equation:
Depth = Baseline * focal length/ Disparity
Now come some tricky parts. I searched high and low for the focal length and was unable to find it in documentation.
I realised just now when writing that the baseline is documented in the aforementioned source however from section IV.B we can see that it can be found in P(i)rect indirectly.
The P_rects can be found in the calibration files and will be used for both calculating the baseline and the translation from uv in the image to xyz in the real world.
The steps are as follows:
For pixel in depthmap:
xyz_normalised = P_rect \ [u,v,1]
where u and v are the x and y coordinates of the pixel respectively
which will give you a xyz_normalised of shape [x,y,z,0] with z = 1
You can then multiply it with the depth that is given at that pixel to result in a xyz coordinate.
For completeness, as P_rect is the depth map here, you need to use P_3 from the cam_cam calibration txt files to get the baseline (as it contains the baseline between the colour cameras) and the P_2 belongs to the left camera which is used as a reference for occ_0 files.

Pointcloud and RGB Image alignment on RealSense ROS

I am working on a dog detection system using deep learning (Tensorflow object detection) and Real Sense D425 camera. I am using the Intel(R) RealSense(TM) ROS Wrapper in order to get images from the camera.
I am executing "roslaunch rs_rgbd.launch" and my Python code is subscribed to "/camera/color/image_raw" topic in order to get the RGB image. Using this image and object detection library, I am able to infer (20 fps) the location of a dog in a image as a box level (xmin,xmax,ymin,ymax)
I will like to crop the PointCloud information with the object detection information (xmin,xmax,ymin,ymax)
and determine if the dog is far away or near the camera. I will like to use the aligned information pixel by pixel between the RGB image and the pointcloud.
How can I do it? Is there any topic for that?
Thanks in advance
Intel publishes their python notebook about the same problem at: https://github.com/IntelRealSense/librealsense/blob/jupyter/notebooks/distance_to_object.ipynb
What they do is as follow :
get color frame and depth frame (point cloud in your case)
align the depth to color
use ssd to detect the dog inside color frame
Get the average depth for detected dog and convert to meter
depth = np.asanyarray(aligned_depth_frame.get_data())
# Crop depth data:
depth = depth[xmin_depth:xmax_depth,ymin_depth:ymax_depth].astype(float)
# Get data scale from the device and convert to meters
depth_scale = profile.get_device().first_depth_sensor().get_depth_scale()
depth = depth * depth_scale
dist,_,_,_ = cv2.mean(depth)
print("Detected a {0} {1:.3} meters away.".format(className, dist))
Hope this help

What is the unit for raw data in Kinect V2?

I am trying to figure out what is the raw data in Kinect V2? ... I know we can convert these raw data to meters and to gray color to represent the depth .... but what is the unit of these raw data ?
and why all the images that captured by Kinect are mirrored?
The raw values stored in the depth image are in millimeters. You can get the X and Y values using the pixel position along with the depth camera intrinsic parameters. If you want I could share a Matlab code that converts the depth image into X,Y,Z values.
Yes, the images are mirrored in Windows-SDK and in the "libfreenect2" which is a open source version of SDK. I couldn't get a solid answer why it is so, but you could look at the discussion available in the link.
There are different kinds of frame which can be captured through Kinect V2. Every raw data captured has a different unit. Like, for depth frame it is millimeters, for color it is RGB (0-255, 0-255, 0-255), for bodyFrames it is 0 or 1 ( having same resolution as depth frame, but can identify up-to a maximum number of human bodies at a time ) and etc.
Ref: https://developer.microsoft.com/en-us/windows/kinect

How can i use depth data in cad_60

i want to get distance from depth image .i have 3D position and i convert this point to 2D.know i read this coordinate data from RGB and depth image .
iin this picture i mark 5 point and see this point data from depth image.
why this coordinate data is not correct ?
or why this data is 0 ?
how can i get real data from depth image ?
The data returned by Kinect has 0 values at pixels it fails to get the depth measurement. This can happen due to several reasons but especially happens at reflective surfaces or glass etc.
Also, the values around the edges are more error prone, so there are more 0 values around the edges than the center of the image.
So, (a) and (d) can be zero, but if you check the nearby pixels there will be depth values. (e) is a bad measurement, as it has a large value, whereas it is closer than rest of the points.
Before acquiring data, maybe you can try some hole filling technique such as Depth Normalization on your data, to remove most of the zero values.

Comparing Kinect depth to OpenGL depth efficiently

Background:
This problem is related with 3D tracking of object.
My system projects object/samples from known parameters (X, Y, Z) to OpenGL and
try to match with image and depth informations obtained from Kinect sensor to infer the object's 3D position.
Problem:
Kinect depth->process-> value in millimeters
OpenGL->depth buffer-> value between 0-1 (which is nonlinearly mapped between near and far)
Though I could recover Z value from OpenGL using method mentioned on http://www.songho.ca/opengl/gl_projectionmatrix.html but this will yield very slow performance.
I am sure this is the common problem, so I hope there must be some cleaver solution exist.
Question:
Efficient way to recover eye Z coordinate from OpenGL?
Or is there any other way around to solve above problem?
Now my problem is Kinect depth is in mm
No, it is not. Kinect reports it's depth as a value in a 11 bit range of arbitrary units. Only after some calibration has been applied, the depth value can be interpreted as a physical unit. You're right insofar, that OpenGL perspective projection depth values are nonlinear.
So if I understand you correctly, you want to emulatea Kinect by retrieving the content of the depth buffer, right? Then the most easy solution was using a combination of vertex and fragment shader, in which the vertex shader passes the linear depth as an additional varying to the fragment shader, and the fragment shader then overwrites the fragment's depth value with the passed value. (You could also use an additional render target for this).
Another method was using a 1D texture, projected into the depth range of the scene, where the texture values encode the depth value. Then the desired value would be in the color buffer.