I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!
In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.
I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.
Related
I'm working on a project to detect the position and orientation of a paper plane.
To collect the data, I'm using an Intel Realsense D435, which gives me accurate, clean depth data to work with.
Now I arrived at the problem of detecting the 2D paper plane silhouette from the 3D point cloud data.
Here is an example of the data (I put the plane on a stick for testing, this will not be in the final implementation):
https://i.stack.imgur.com/EHaEr.gif
Basically, I have:
A 3D point cloud with points on the plane
A 2D shape of the plane
I would like to calculate what rotations/translations are needed to align the 2D shape to the 3D point cloud as accurate as possible.
I've searched online, but couldn't find a good way to do it. One way would be to use Iterative Closest Point (ICP) to first take a calibration pointcloud of the plane in a known orientation, and align it with the current orientation. But from what I've heard, ICP doesn't perform well if the pointclouds aren't kind of already closely aligned at the start.
Any help is appreciated! Coding language doesn't matter.
Does your 3d point cloud have outliers? How many in what way?
How did you use ICP exactly?
One way would be using ICP, with a hand-crafted initial guess using
pcl::transformPointCloud (*cloud_in, *cloud_icp, transformation_matrix);
(to mitigate the problem that ICP needs to be close to work.)
What you actually want is the plane-model that describes the position and orientation of your point-cloud right?
A good estimator of your underlying function can be found with: pcl::ransac
pcl::ransace model consensus
You can then get the computedModel coefficents.
Now finding the correct transformation is just: How to calculate transformation matrix from one plane to another?
I am having problems with mesh for the merged point clouds using ICP algorithm.
Firstly, I have several point clouds of different view points for the same scene. And I register them using ICP(Iterative closest point). Then I use the software such as Meshlab and CloudcompareStero to get the mesh for the registered point cloud. And I found the registered point cloud is hierarchy. It behaves as the registered point cloud have several layers in depth direction. Although the difference is very small(I set high accuracy in ICP), it affects the visual quality greatly.
Basically, I think that I have several views' point cloud and I would get a more denser and completed point cloud which contains more information than the one point cloud using ICP. And the visual quality of the registered point cloud should be better than the single views' point cloud. However, because of the layer effect, the visual quality of registered point cloud' mesh is worse than the single view point cloud's mesh.
I think the kind of question should appear in the kinect fusion or some point cloud registration process. But I don't know where to find the cue.
So I want to ask greatest as you for some advice or thoughts to reduce the hierarchy effect and improve the visual quality.
I am facing a problem on 3D reconstruction since I am a new to this filed. I have some different views' depth map(point clouds), I want to use them to reconstruct the scene to get the effect like using the kinect fusion. Is there any paper of source code to settle this problem. Or any ideas on this problem.
PS:the point cloud is stored as a file with (x,y,z), you can check here to get the data.
Thank you very much.
As you have stated that you are new to this field, I shall attempt to keep this high level. Please do comment if there is something that is not clear.
The pipeline you refer to has three key stages:
Integration
Rendering
Pose Estimation
The Integration stage takes the unprojected points from a Depth Map (Kinect image) under the current pose and "integrates" them into a spatial data structure (a Voxel Volume such as a Signed Distance Function or a hierarchical structure like an Octree), often by maintaining per Voxel running averages.
The Rendering stage takes the inverse pose for the current frame and produces an image of the visible parts of the model currently in view. For the common volumetric representations this is achieved by Raycasting. The output of this stage provides the points of the model to which the next live frame is registered (the next stage).
The Pose Estimation stage registers the previously extracted model points to those of the live frame. This is commonly achieved by the Iterative Closest Point algorithm.
With regards to pertinent literature, I would advise the following papers as a starting point.
KinectFusion: Real-Time Dense Surface Mapping and Tracking
Real-time 3D Reconstruction at Scale using Voxel Hashing
Very High Frame Rate Volumetric Integration of Depth Images on
Mobile Devices
Background
I'm working on a project where a user gets scanned by a Kinect (v2). The result will be a generated 3D model which is suitable for use in games.
The scanning aspect is going quite well, and I've generated some good user models.
Example:
Note: This is just an early test model. It still needs to be cleaned up, and the stance needs to change to properly read skeletal data.
Problem
The problem I'm currently facing is that I'm unsure how to place skeletal data inside the generated 3D model. I can't seem to find a program that will let me insert the skeleton in the 3D model programmatically. I'd like to do this either via a program that I can control programmatically, or adjust the 3D model file in such a way that skeletal data gets included within the file.
What have I tried
I've been looking around for similar questions on Google and StackOverflow, but they usually refer to either motion capture or skeletal animation. I know Maya has the option to insert skeletons in 3D models, but as far as I could find that is always done by hand. Maybe there is a more technical term for the problem I'm trying to solve, but I don't know it.
I do have a train of thought on how to achieve the skeleton insertion. I imagine it to go like this:
Scan the user and generate a 3D model with Kinect;
1.2. Clean user model, getting rid of any deformations or unnecessary information. Close holes that are left in the clean up process.
Scan user skeletal data using the Kinect.
2.2. Extract the skeleton data.
2.3. Get joint locations and store as xyz-coordinates for 3D space. Store bone length and directions.
Read 3D skeleton data in a program that can create skeletons.
Save the new model with inserted skeleton.
Question
Can anyone recommend (I know, this is perhaps "opinion based") a program to read the skeletal data and insert it in to a 3D model? Is it possible to utilize Maya for this purpose?
Thanks in advance.
Note: I opted to post the question here and not on Graphics Design Stack Exchange (or other Stack Exchange sites) because I feel it's more coding related, and perhaps more useful for people who will search here in the future. Apologies if it's posted on the wrong site.
A tricky part of your question is what you mean by "inserting the skeleton". Typically bone data is very separate from your geometry, and stored in different places in your scene graph (with the bone data being hierarchical in nature).
There are file formats you can export to where you might establish some association between your geometry and skeleton, but that's very format-specific as to how you associate the two together (ex: FBX vs. Collada).
Probably the closest thing to "inserting" or, more appropriately, "attaching" a skeleton to a mesh is skinning. There you compute weight assignments, basically determining how much each bone influences a given vertex in your mesh.
This is a tough part to get right (both programmatically and artistically), and depending on your quality needs, is often a semi-automatic solution at best for the highest quality needs (commercial games, films, etc.) with artists laboring over tweaking the resulting weight assignments and/or skeleton.
There are algorithms that get pretty sophisticated in determining these weight assignments ranging from simple heuristics like just assigning weights based on nearest line distance (very crude, and will often fall apart near tricky areas like the pelvis or shoulder) or ones that actually consider the mesh as a solid volume (using voxels or tetrahedral representations) to try to assign weights. Example: http://blog.wolfire.com/2009/11/volumetric-heat-diffusion-skinning/
However, you might be able to get decent results using an algorithm like delta mush which allows you to get a bit sloppy with weight assignments but still get reasonably smooth deformations.
Now if you want to do this externally, pretty much any 3D animation software will do, including free ones like Blender. However, skinning and character animation in general is something that tends to take quite a bit of artistic skill and a lot of patience, so it's worth noting that it's not quite as easy as it might seem to make characters leap and dance and crouch and run and still look good even when you have a skeleton in advance. That weight association from skeleton to geometry is the toughest part. It's often the result of many hours of artists laboring over the deformations to get them to look right in a wide range of poses.
I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.