I need to segment depth image that captured from
a kinect device in realtime(30fps).
Currently I am using EuclideanClusterExtraction from PCL, it works but very slow(1fps).
Here is a paragraph in the PCL tutorial:
“Unorganized” point clouds are characterized by non-existing point references between points from different point clouds due to varying size, resolution, density and/or point ordering. In case of “organized” point clouds often based on a single 2D depth/disparity images with fixed width and height, a differential analysis of the corresponding 2D depth data might be faster.
So I think there are faster method to segment depth image.
The project doesn't use the RGB Camera, so I need a segmentation method that use only the depth image.
PCL provides segmentation algorithms optimised for organised point clouds.
For details see:
The tutorial here describing them and showing how to use them:
http://www.pointclouds.org/assets/icra2012/segmentation.pdf
The example code in thePCL distribution (relatively late versions): organized_segmentation_demo and openni_organized_multi_plane_segmentation
In the API, OrganizedConnectedComponentSegmentation and OrganizedMultiPlaneSegmentation. The latter builds on the former.
Related
I am facing a problem on 3D reconstruction since I am a new to this filed. I have some different views' depth map(point clouds), I want to use them to reconstruct the scene to get the effect like using the kinect fusion. Is there any paper of source code to settle this problem. Or any ideas on this problem.
PS:the point cloud is stored as a file with (x,y,z), you can check here to get the data.
Thank you very much.
As you have stated that you are new to this field, I shall attempt to keep this high level. Please do comment if there is something that is not clear.
The pipeline you refer to has three key stages:
Integration
Rendering
Pose Estimation
The Integration stage takes the unprojected points from a Depth Map (Kinect image) under the current pose and "integrates" them into a spatial data structure (a Voxel Volume such as a Signed Distance Function or a hierarchical structure like an Octree), often by maintaining per Voxel running averages.
The Rendering stage takes the inverse pose for the current frame and produces an image of the visible parts of the model currently in view. For the common volumetric representations this is achieved by Raycasting. The output of this stage provides the points of the model to which the next live frame is registered (the next stage).
The Pose Estimation stage registers the previously extracted model points to those of the live frame. This is commonly achieved by the Iterative Closest Point algorithm.
With regards to pertinent literature, I would advise the following papers as a starting point.
KinectFusion: Real-Time Dense Surface Mapping and Tracking
Real-time 3D Reconstruction at Scale using Voxel Hashing
Very High Frame Rate Volumetric Integration of Depth Images on
Mobile Devices
I have a polygon mesh of a room in high resolution, and I want to extract vertices color information and map them as a UV map, so I can generate a texture atlas of the room.
After that, I want to remesh the model in order to reduce the number of polygons and map the hi-res texture onto the new mesh in lower resolution.
So far I've found this link to do it in Blender, but I would like to do it programmatically. Do you know about any library/code that could help my in my task?
I guess first of all I have to segment the model (normals criterion could be helpful) and then cut each mesh segment, so only then I am able to parameterize it. About parameterization, LSCM seems to provide good results for simple models. Once having available the texture atlas, I think the problem becomes a simple task of texture mapping.
My main problem is segmentation and mesh cutting. I'm using CGAL library for that purpose, but the algorithm is too simple to cut complex shapes. Any hint about a better segmentation/cutting algorithm that performs well for room-sized models?
EDIT:
The mesh consists in a room reconstructed with a RGB-D camera, with 2.5 million vertices and 4.7 million faces. The point is to extract high resolution texture, remesh the model to reduce number of polygons and then remap the texture onto it. It's not a closed mesh, and there are holes due to reconstruction, so I'm guessing if my task is not possible to accomplish at all.
I attach a capture of the mesh.
I would suggest using the following 4-steps procedure:
Step 1: remesh
For this type of mesh that comes from computer vision, you need a remesher that is robust to holes, overlaps, skinny triangles etc... You can use my GEOGRAM software [1]. Use the following command:
vorpalite my_input.obj my_output.obj pre=false post=false pts=30000
where 30000 is the number of desired points (adapt it to the complexity of your input). Note: I am deactivating pre and post-processing (pre=false post=false) that may remove too much parts of the mesh for this type of mesh.
Step 2: segment the remesh
My favourite method is "Variational Shape Approximation" [3]. I like it because it is simple to implement and gives reasonable results in most cases.
Step 3: parameterize
Besides my LSCM method, you may use ABF++ that we developed after [4], that gives much better results in most cases. You may also try ARAP [5].
Step 4: bake the texture
Once the simplified mesh is parameterized, you need to copy the colors from the original mesh onto the new one. This means determining for each pixel of the texture where it goes in 3D, and finding the nearest point in the original 3D mesh.
Segmentation, parameterization and baking are implemented in my Graphite software [2] (use the old version 2.x, the newer version 3.x does not have all the texturing functionalities).
[1] geogram: http://alice.loria.fr/software/geogram/doc/html/index.html
[2] graphite: http://alice.loria.fr/software/graphite/doc/html/
[3] Variational Shape Approximation (Cohen-Steiner, Alliez, Desbrun, SIGGRAPH 2004): http://www.geometry.caltech.edu/pubs/CAD04.pdf
[4] ABF++: http://alice.loria.fr/index.php/publications.html?redirect=1&Paper=ABF_plus_plus#2004
[5] ARAP: cs.harvard.edu/~sjg/papers/arap.pdf
For reducing the number of polygons, I prefer using mesh decimation. My recommended workflow: (Input: High resolution mesh(mesh0) with vertex color).
Compute uv coordinates for mesh0.
Generate texture image(textureImage) by vertex color. Thus, you have a texture mesh(mesh0 with uv coordinates, textureImage).
Apply mesh decimation to mesh0, and the decimation should take uv coorindates into consideration.
I have an example about this workflow in my site, the example image: Decimation of texture mesh .
Or you can refer my site for details.
I have a depth camera feed already set up and in order to make it more interesting I want to compute some data out of it like normals, motion/optical flow and other data sets to use them for visual effects. I am particularly interested in optical flow and whether it can be computed from a depth only stream.
Has this been implemented? If so I'd like to know what are the methods and understand which one would be the easiest to use.
I worked on Kinect depth camera and implemented a patient tracking algorithm. The algorithm itself is commercial and I cannot disclose the details. But I can give my two cents here.
The depth feed from Kinect should not directly used for optical flow (motion tracking), due to no depth pixels. You can use inpainting to fill in gaps in the depth image. If you are using OpenCV, you can refer to the implementation here.
http://www.morethantechnical.com/2011/03/05/neat-opencv-smoothing-trick-when-kineacking-kinect-hacking-w-code/
I suggest using a smoothing filter after inpainting to have a smooth depth data near object edges. You can use simple filters present in OpenCV with depth stream. It would be nice to downsample 16 bit depth to 8 bit RGB image to help visualize disparity image.
I believe you can then use the resulting stream with optical flow algorithm from OpenCV. Here is an example.
http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_lucas_kanade.html#gsc.tab=0
You can also use Dense trajectory implementation, but I believe it is processor intensive and the final frame rate might be really slow.
https://lear.inrialpes.fr/people/wang/dense_trajectories
Hope this helps.
I have an accurate mesh surface model of an implant I'd like to optimally rigidly align to a computed tomography scan (scalar volume) that contains the exact same object. I've tried detecting edges in the image volume with canny filter and doing an iterative closest point alignment between the edges and the vertices of the mesh, but it's not working. I also tried voxelizing the mesh, and using image volume alignment methods (Mattes Mutual) which yields very inconsistent results.
Any other suggestions?
Thank you.
Generally, mesh and volume are two different data structures. You have to either convert mesh to volume or convert volume to mesh.
I would recommend doing a segmentation of volume data first, to segment out the issues you want to register. With canny filter might not be enough to segment the border clearly. I would like to recommend you with level-set method and active contour model. These two are frequently used in medical image processing. For these two topics, I would recommend professor Chunming Li's work.
And after you do the segmentation of volume data, you might be able to reconstruct mesh model of that volume with marching cubes. The vertexes of two mesh could be registered through a simple ICP algorithm.
However, this is just a workaround instead of real registration, it always takes too much time to do the segmentation.
I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.