Kinect Fusion - data format - object segmentation - kinect

I have started working with Kinect Fusion recently for my 3D reconstruction project. I have two questions in this field:
What is inside .STL file? is it the vertices for different objects in the scene?
How can I segment a specific object (e.g. my hand) in the reconstructed file? Is there a way to do so using Kinect Fusion ?
Thank you in advance!

Yes, there is a Wikipedia article on the STL format:
http://en.wikipedia.org/wiki/STL_format
Firstly, you would want your hand to be fixed, because Kinect Fusion will only reconstruct static scenes. Secondly, you could use the min and max depth threshold values to filter out the rest of the scene. Or if that does not work too well, you could reconstruct the whole scene, get the mesh, and then filter out the vertices that are beyond a certain depth for example.

Related

how to reconstruct scene from different views' point clouds

I am facing a problem on 3D reconstruction since I am a new to this filed. I have some different views' depth map(point clouds), I want to use them to reconstruct the scene to get the effect like using the kinect fusion. Is there any paper of source code to settle this problem. Or any ideas on this problem.
PS:the point cloud is stored as a file with (x,y,z), you can check here to get the data.
Thank you very much.
As you have stated that you are new to this field, I shall attempt to keep this high level. Please do comment if there is something that is not clear.
The pipeline you refer to has three key stages:
Integration
Rendering
Pose Estimation
The Integration stage takes the unprojected points from a Depth Map (Kinect image) under the current pose and "integrates" them into a spatial data structure (a Voxel Volume such as a Signed Distance Function or a hierarchical structure like an Octree), often by maintaining per Voxel running averages.
The Rendering stage takes the inverse pose for the current frame and produces an image of the visible parts of the model currently in view. For the common volumetric representations this is achieved by Raycasting. The output of this stage provides the points of the model to which the next live frame is registered (the next stage).
The Pose Estimation stage registers the previously extracted model points to those of the live frame. This is commonly achieved by the Iterative Closest Point algorithm.
With regards to pertinent literature, I would advise the following papers as a starting point.
KinectFusion: Real-Time Dense Surface Mapping and Tracking
Real-time 3D Reconstruction at Scale using Voxel Hashing
Very High Frame Rate Volumetric Integration of Depth Images on
Mobile Devices

Creat mesh from point cloud on a 2D grid

I am using the new Kinect v2 and I am getting the depth map of the Kinect.
After I get the depth map I convert the depth data from Depth Space to Camera Space.
As far as I understand this is done, by converting all the X,Y coordinate of each pixel to Camera Space + adding the depth value as Z coordinate (also Kinect gives the depth value in millimetres so it is also converted to hold meters).
Because of this, the point cloud is actually on 2D grid extended with the depth value. The visualization also confirms this, since it is easy to notice that the points are ordered in a grid due to the above conversation.
For visualization I am using OpenGL the old fashion way (glBegin(...) and glEnd()).
I want to create a mesh out of the points. I kind of managed to do it with GL_TRIANGLES, but then I have lot of duplicated vertices and edges. So I thought I should create a better triangulation with GL_TRIANGLE_STRIP, but I am stuck here because I can't come up with a good algorithm which can go through my 2D grid in a way that I can feed it to the GL_TRIANGLE_STRIP so it creates a nice surface.
The problems:
For each triangle's vertices I am checking the Z coordinate. If it exceeds a certain threshold I disregard the triangle => this might create holes in my 2D grid.
Some depth values are NaN, because the Kinect can't "see" there nothing (for example an object is too far or too close) => this also creates holes in the 2D grid.
Anybody has any suggestion what would be the best method to solve this issue?
If you're able to use the point cloud library, you could use the
class pcl::OrganizedFastMesh< PointInT >.
http://docs.pointclouds.org/trunk/classpcl_1_1_organized_fast_mesh.html
I use it to triangulate complete depth frames.
You can try also a delanauy triangulation in 3d and look for the tetrahedons on the exterior. An easy algorithm is the bowyer-watson with tetrahedons and circumspheres. Cgal is a good example.

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

Transformation between point clouds

I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!
In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.
I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.

Does Miscrosoft Kinect SDK provide any API that I can input detph image then return skeleton?

I need a help about how to get skeleton data from my modified Depth Image using KinectSDK.
I have two Kinect. And I can get the DepthImage from both of them. I transform the two Depth Image into a 3D-coordinate and combine them together by using OpenGL. Finally, I reproject the 3D scene and get a new DepthImage. (The resolution is also 640*480, and the FPS of my new DepthImage is about 20FPS)
Now, I want to get the skeleton from this new DepthImage by using KinectSDK.
Anyone can help me about this?
Thanks
This picture is my flow path:
Does Miscrosoft Kinect SDK provide any API that I can input detph image then return skeleton?
No.
I tried to find some libraries (not made by Microsoft) to accomplish this task, but it is not simple at all.
Try to follow this discussion on ResearchGate, there are some useful links (such as databases or early stage projects) from where you can start if you want to develop your library and share it with us.
I was hoping to do something similar, feed post-processed depth image back to Kinect for skeleton segmentation, but unfortunately there doesn't seem to be a way of doing this with existing API.
Why would you reconstruct a skeleton based upon your 3d depth data ?
The kinect Sdk can record a skeleton directly without such reconstruction.
And one camera is enough to record a skeleton.
if you use the kinect v2, then out of my head i can track 3 or 4 skeletons or so.
kinect provides basically 3 streams, rgb video, depth, skeleton.