How can I compute optical flow from a depth image stream from a depth camera? - kinect

I have a depth camera feed already set up and in order to make it more interesting I want to compute some data out of it like normals, motion/optical flow and other data sets to use them for visual effects. I am particularly interested in optical flow and whether it can be computed from a depth only stream.
Has this been implemented? If so I'd like to know what are the methods and understand which one would be the easiest to use.

I worked on Kinect depth camera and implemented a patient tracking algorithm. The algorithm itself is commercial and I cannot disclose the details. But I can give my two cents here.
The depth feed from Kinect should not directly used for optical flow (motion tracking), due to no depth pixels. You can use inpainting to fill in gaps in the depth image. If you are using OpenCV, you can refer to the implementation here.
I suggest using a smoothing filter after inpainting to have a smooth depth data near object edges. You can use simple filters present in OpenCV with depth stream. It would be nice to downsample 16 bit depth to 8 bit RGB image to help visualize disparity image.
I believe you can then use the resulting stream with optical flow algorithm from OpenCV. Here is an example.
You can also use Dense trajectory implementation, but I believe it is processor intensive and the final frame rate might be really slow.
Hope this helps.


Is programming a voxel based graphics API theoretically possible?

This is entirely a theoretical question because I understand the time it would take to do such a thing would be ridiculous
I've been working with "voxels" a lot lately and the only way I can display them to a user is to either triangulate the visible surfaces or make a CPU ray-tracer but both come with their own problems.
Simply put, if we dismiss the storage space needed for voxel meshs and targeted a very specific GPU would someone who was wanting to create a graphics API like OpenGL but with "true" voxel primitives that don't need to be converted be able to make such thing or are GPUs designed specifically for triangles with no way to introduce a new base primitive?
Its possible and it was already done many times
games like Minecraft,SpaceEngineers...
3D printing tools and slicers
MRI/PET scans tools
Yes rendering on GPU is possible with the two base methods you mention. Games usually use the transform to boundary representation 3D geometry. With rise of shaders even ray tracers are now possible here mine:
simple GLSL voxel ray tracer
using native OpenGL architecture and passing geometry as 3D texture. In order to obtain speed you need to add BVH or similar spatial subdivision of geometry...
However voxel based tools have been here for quite some time. For example many isometric games/engines are voxel based (tile is a voxel) like this one:
Improving performance of click detection on a staggered column isometric grid
Also do you remember UFO ? It was playable on x286 and it was also "voxel/tile" based isometric.

Are there cameras that capture not only color but the actual depth?

Getting into 3D reconstruction techniques, I'm curious whether there are cameras that capture not only color but the depth at the moment of image being captured. It would appear that getting the depth of a particular pixel on the sensor would be far more accurate than needing to reconstruct after the fact by using many images.
Thoughts? Suggestions?
Yes, there are sensors which are not based on the triangulation principle. They use the time of flight or similar principles to capture the depth for a particular pixel. Take a look at PMD-Sensors
There are Microsoft Kinect, and Intel® RealSense™ that you can take a look at.

how to reconstruct scene from different views' point clouds

I am facing a problem on 3D reconstruction since I am a new to this filed. I have some different views' depth map(point clouds), I want to use them to reconstruct the scene to get the effect like using the kinect fusion. Is there any paper of source code to settle this problem. Or any ideas on this problem.
PS:the point cloud is stored as a file with (x,y,z), you can check here to get the data.
Thank you very much.
As you have stated that you are new to this field, I shall attempt to keep this high level. Please do comment if there is something that is not clear.
The pipeline you refer to has three key stages:
Pose Estimation
The Integration stage takes the unprojected points from a Depth Map (Kinect image) under the current pose and "integrates" them into a spatial data structure (a Voxel Volume such as a Signed Distance Function or a hierarchical structure like an Octree), often by maintaining per Voxel running averages.
The Rendering stage takes the inverse pose for the current frame and produces an image of the visible parts of the model currently in view. For the common volumetric representations this is achieved by Raycasting. The output of this stage provides the points of the model to which the next live frame is registered (the next stage).
The Pose Estimation stage registers the previously extracted model points to those of the live frame. This is commonly achieved by the Iterative Closest Point algorithm.
With regards to pertinent literature, I would advise the following papers as a starting point.
KinectFusion: Real-Time Dense Surface Mapping and Tracking
Real-time 3D Reconstruction at Scale using Voxel Hashing
Very High Frame Rate Volumetric Integration of Depth Images on
Mobile Devices

How to segment depth image faster?

I need to segment depth image that captured from
a kinect device in realtime(30fps).
Currently I am using EuclideanClusterExtraction from PCL, it works but very slow(1fps).
Here is a paragraph in the PCL tutorial:
“Unorganized” point clouds are characterized by non-existing point references between points from different point clouds due to varying size, resolution, density and/or point ordering. In case of “organized” point clouds often based on a single 2D depth/disparity images with fixed width and height, a differential analysis of the corresponding 2D depth data might be faster.
So I think there are faster method to segment depth image.
The project doesn't use the RGB Camera, so I need a segmentation method that use only the depth image.
PCL provides segmentation algorithms optimised for organised point clouds.
For details see:
The tutorial here describing them and showing how to use them:
The example code in thePCL distribution (relatively late versions): organized_segmentation_demo and openni_organized_multi_plane_segmentation
In the API, OrganizedConnectedComponentSegmentation and OrganizedMultiPlaneSegmentation. The latter builds on the former.

kinect SKD skeletonization method

I was wondering if there's a way to modify the depth map prior to sending it to the skeletonization algorithm used by the kinect, for example, if we want to run the skeletonization on the output of a segmented depth image. So far I have reviewed the methods in the sdk but I haven't been able to find a skeletonization method exposed. It's like you either turn the skeleton on or off but you have no control on its inputs.
If anyone has any idea regarding this topic I will be much obliged.
Shamita: skeletonization means tracking the joints of the user in real time. I edit because I can't comment (not enought reputation).
All the joints' give a depth coordinate and I don't think you can mess with the Kinect hardware input stream. But you can categorize the joints regarding to depth segments. For example with the live stream you categorize it with the corresponding category if it is below 10 and above five it is in category A. this can be done with the live stream itself because it is just a simple calculation.