Point cloud sensor position estimation - kinect

Let's sat I have point cloud data that I don't know the sensor position(and it is not [0,0,0]). Is there a good way to estimate original sensor position from it?
For instance, from the point cloud below,
Point cloud data:
find viewpoint that allow point cloud looks like this.(like depthmap) So there is no (or least) holes can be observed.
Point cloud looks like this picture from viewpoint that I want to find:

Typically a point cloud reconstruction is done by capturing the object in different angles and aligning them to together. In point cloud this process is called Registration. So that there is no static position to the kinect sensor since in each frame, the sensor position in global coordinates space is not static.
So that I assume you are not asking about a point cloud which is gone through a reconstruction process, thus a point cloud which generated from a static sensor position. If that point cloud is not gone through any transformations (rotation, transition) and generated using raw depth data, then the point cloud is always generated in depth camera coordinates space where origin is the position of the Kinect sensor.
In your question it says sensor position is not [0,0,0], That means after capturing the depth image, it has been transformed to new coordinate space.(I assume the origin of the new coordinate space is the center of the given point cloud). So without knowing this transformation matrix, you can't get the original camera position(relative to the new coordinate).
Most impotently, when you say "position of the sensor", you need to specify the coordinate space which you are referring to. position is always relative to the specific coordinate space.
You can find the position of the camera as #Atif Anwer said BUT under one condition, either one of the two point clouds has NOT been gone trough a transformation, thus it is in original coordinates space. Then you can find the transformation of the second point cloud relative to the first point cloud and apply that transformation matrix to [0,0,0] to find the camera position of the second point cloud relative to the initial depth camera coordinate space.

Related

How do I use Meshlab to produce meshes of an entire cloud?

Help! I'm trying to create a mesh from a point cloud (created via on-site laser-scanning), and Meshlab is giving me difficulty.
I'm able to clean up and subsample the raw point cloud in CloudCompare, and have been trying to create a mesh in Meshlab. I've assigned normals (Filters > Normals... > Compute Normals...), which seems to work.
I then used the Screen Poisson filter to create a mesh (Filters > Remeshing... > Surface Reconstruction: Screen Poisson), which produced a good result for about 2/3rds of my point cloud. The remaining 3rd of my point cloud didn't seem to be meshed at all, and the Bounding Boxes of the two layers (point cloud and mesh) are radically different, with the mesh cutting off a big chunk of the cloud.
Here's the point cloud I'm starting with:
Here's the cloud and mesh overlaid. You can clearly see the different Bounding Boxes.
And here's the mesh on its own. I have no idea why the mesh stopped where it did.
I tried to replicate the issue on a different point cloud, and produced a very similar result, albeit with a mesh that represents only about 1/5th of the point cloud this time:
Fresh attempt with a different point cloud.
Any advice on how I can avoid this?

how to reconstruct scene from different views' point clouds

I am facing a problem on 3D reconstruction since I am a new to this filed. I have some different views' depth map(point clouds), I want to use them to reconstruct the scene to get the effect like using the kinect fusion. Is there any paper of source code to settle this problem. Or any ideas on this problem.
PS:the point cloud is stored as a file with (x,y,z), you can check here to get the data.
Thank you very much.
As you have stated that you are new to this field, I shall attempt to keep this high level. Please do comment if there is something that is not clear.
The pipeline you refer to has three key stages:
Integration
Rendering
Pose Estimation
The Integration stage takes the unprojected points from a Depth Map (Kinect image) under the current pose and "integrates" them into a spatial data structure (a Voxel Volume such as a Signed Distance Function or a hierarchical structure like an Octree), often by maintaining per Voxel running averages.
The Rendering stage takes the inverse pose for the current frame and produces an image of the visible parts of the model currently in view. For the common volumetric representations this is achieved by Raycasting. The output of this stage provides the points of the model to which the next live frame is registered (the next stage).
The Pose Estimation stage registers the previously extracted model points to those of the live frame. This is commonly achieved by the Iterative Closest Point algorithm.
With regards to pertinent literature, I would advise the following papers as a starting point.
KinectFusion: Real-Time Dense Surface Mapping and Tracking
Real-time 3D Reconstruction at Scale using Voxel Hashing
Very High Frame Rate Volumetric Integration of Depth Images on
Mobile Devices

Creat mesh from point cloud on a 2D grid

I am using the new Kinect v2 and I am getting the depth map of the Kinect.
After I get the depth map I convert the depth data from Depth Space to Camera Space.
As far as I understand this is done, by converting all the X,Y coordinate of each pixel to Camera Space + adding the depth value as Z coordinate (also Kinect gives the depth value in millimetres so it is also converted to hold meters).
Because of this, the point cloud is actually on 2D grid extended with the depth value. The visualization also confirms this, since it is easy to notice that the points are ordered in a grid due to the above conversation.
For visualization I am using OpenGL the old fashion way (glBegin(...) and glEnd()).
I want to create a mesh out of the points. I kind of managed to do it with GL_TRIANGLES, but then I have lot of duplicated vertices and edges. So I thought I should create a better triangulation with GL_TRIANGLE_STRIP, but I am stuck here because I can't come up with a good algorithm which can go through my 2D grid in a way that I can feed it to the GL_TRIANGLE_STRIP so it creates a nice surface.
The problems:
For each triangle's vertices I am checking the Z coordinate. If it exceeds a certain threshold I disregard the triangle => this might create holes in my 2D grid.
Some depth values are NaN, because the Kinect can't "see" there nothing (for example an object is too far or too close) => this also creates holes in the 2D grid.
Anybody has any suggestion what would be the best method to solve this issue?
If you're able to use the point cloud library, you could use the
class pcl::OrganizedFastMesh< PointInT >.
http://docs.pointclouds.org/trunk/classpcl_1_1_organized_fast_mesh.html
I use it to triangulate complete depth frames.
You can try also a delanauy triangulation in 3d and look for the tetrahedons on the exterior. An easy algorithm is the bowyer-watson with tetrahedons and circumspheres. Cgal is a good example.

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

Transformation between point clouds

I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!
In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.
I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.