It is possible to recognize all objects from a room with Microsoft Kinect? - kinect

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!

Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.

Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

Related

Is programming a voxel based graphics API theoretically possible?

This is entirely a theoretical question because I understand the time it would take to do such a thing would be ridiculous
I've been working with "voxels" a lot lately and the only way I can display them to a user is to either triangulate the visible surfaces or make a CPU ray-tracer but both come with their own problems.
Simply put, if we dismiss the storage space needed for voxel meshs and targeted a very specific GPU would someone who was wanting to create a graphics API like OpenGL but with "true" voxel primitives that don't need to be converted be able to make such thing or are GPUs designed specifically for triangles with no way to introduce a new base primitive?
Its possible and it was already done many times
games like Minecraft,SpaceEngineers...
3D printing tools and slicers
MRI/PET scans tools
Yes rendering on GPU is possible with the two base methods you mention. Games usually use the transform to boundary representation 3D geometry. With rise of shaders even ray tracers are now possible here mine:
simple GLSL voxel ray tracer
using native OpenGL architecture and passing geometry as 3D texture. In order to obtain speed you need to add BVH or similar spatial subdivision of geometry...
However voxel based tools have been here for quite some time. For example many isometric games/engines are voxel based (tile is a voxel) like this one:
Improving performance of click detection on a staggered column isometric grid
Also do you remember UFO ? It was playable on x286 and it was also "voxel/tile" based isometric.

kinect SKD skeletonization method

I was wondering if there's a way to modify the depth map prior to sending it to the skeletonization algorithm used by the kinect, for example, if we want to run the skeletonization on the output of a segmented depth image. So far I have reviewed the methods in the sdk but I haven't been able to find a skeletonization method exposed. It's like you either turn the skeleton on or off but you have no control on its inputs.
If anyone has any idea regarding this topic I will be much obliged.
Shamita: skeletonization means tracking the joints of the user in real time. I edit because I can't comment (not enought reputation).
All the joints' give a depth coordinate and I don't think you can mess with the Kinect hardware input stream. But you can categorize the joints regarding to depth segments. For example with the live stream you categorize it with the corresponding category if it is below 10 and above five it is in category A. this can be done with the live stream itself because it is just a simple calculation.

Transformation between point clouds

I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!
In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.
I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.

how can I detect floor movements such as push-ups and sit-ups with Kinect?

I have tried to implement this using skeleton tracking provided by Kinect. But it doesn't work when I am lying down on a floor.
According to Blitz Games CTO Andrew Oliver, there are specific ways to implement with depth stream or tracking silhouette of a user instead of using skeleton frames from Kinect API. You may find a good example in video game Your Shape Fitness. Here is a link showing floor movements such as push-ups!
Do you guys have any idea how to implement this or detect movements and compare them with other movements using depth stream?
What if a 360 degree sensor was developed, one that recognises movements not only directly in front, to the left, or right of it, but also optimizes for movement above(?)/below it? The image that I just imagined was the spherical, 360 degree motion sensors often used in secure buildings and such.
Without another sensor I think you'll need to track the depth data yourself. Here's a paper with some details about how MS implements skeletal tracking in the Kinect SDK, that might get you started. They implement object matching while parsing the depth data to capture joints in the body, you may need to implement some object templates and algorithms to do the matching yourself. Unless you can reuse some of the skeletal tracking libraries to parse out objects from the depth data for you.

Tutorials for controlling 3D modeling objects

I have some experience with Blender such that I can make a semitransparent cylinder of specified dimensions and small spheres. I want to (for a chemistry tutorial video explaining temperature and heat concepts) write a program that will:
Set up the cylinder and some spheres in a coordinate space
Set up a camera and lighting
Get the spheres moving around in random directions while keeping track of their positions and making them bounce when necessary (this I can figure out given a coordinate space; and I'm not going to get bone-crunchingly accurate trying to do accelerations, taking "mass" into account, etc. just going to send balls in another direction at the "speed" all the balls are going)
Record what this would look like through the camera for a set amount of time (thinking command line option in seconds)
In other words, by #4, this program doesn't even need to be GUI at all. I just want the program to make a video.
It may take me a very long time to actualize this because though I have a lot of experience with C, C++, and Java, I don't know how to take a 3D model file and programmatically control it. I don't even know the infrastructure of libraries and accompanying API to control 3D objects and record the camera to a file.
Are there any tutorials that would go from starting with some 3D models to programmatically setting up a scene (objects, camera, lights), programmatically moving the objects in the coordinate space, and recording the video to a file?
Knowing some programming already, I want to point you to Unity, www.unity3d.com
Unity is a 3d game engine, though it can be used for a number of different things, including this program you have in mind.
It's programmed with C# or Javascript, and I think you could pick these languages up easily enough.
Basically what you described in your last paragraph is exactly what Unity does.