Kinect skeleton algorithm from depth images - kinect

I am curious if is it possible to use the skeleton algorithm from Kinect sensor.
More specific I have some depth images and I want to extract the skeleton. Is it possible?

Yes, it is possible.
But probably, it is not simple.
The algorithm that allows the skeletal tracking is called "Real-Time Human Pose Recognition in Parts from a Single Depth Image". In other words, it has been made to estimate the skeletal joints from one single depth image, that is what you need.
The advantage to use an SDK (like the Microsoft one, or any other you prefer) is that you don't need to reimplement the skeletal tracking algorithm. In fact, it is quite complex and it also needs a lot of training data to be artificially created and properly used.
However, if you want to know more about it, you can find everything you need on this page, in which there is a link to the original paper and some supplementary material about building the training data set used to implement the algorithm.

To track skeletons with the Kinect, you have to enable the SkeletonStream and get frames with skeletal information inside them (as opposed to getting the information from depth frames. No skeletal information is stored inside them).
First you have to enable the skeleton stream in your application, just as you would with the depth stream or the color stream (I'm assuming you understand that, since you already have depth images).
sensor.SkeletonStream.Enable(new TransformSmoothParameters()
{
Smoothing = 0.5f,
Correction = 0.5f,
Prediction = 0.5f,
JitterRadius = 0.5f,
MaxDeviationRadius = 0.04f
});; // enable the skeleton stream, you could essentially not include any of the content in between the first and last curved brackets, as they are mainly used to stabilize the skeleton information (e.g. predict where a joint is if it disappears)
skeletonData = new Skeleton[kinect.SkeletonStream.FrameSkeletonArrayLength]; // this array will hold your skeletal data and is equivalent to the short array that holds your depth data.
sensor.SkeletonFrameReady += this.SkeletonFrameReady;
Then you have to have a method that is fired everytime the Kinect has a skeleton frame to display (a frame with all the skeletal information)
private void SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
using (SkeletonFrame skeletonFrame = e.SkeletonFrame()) //get all the skeleton data
{
if (skeletonFrame == null) //if there's no data, then exit
{
return;
}
skeletonFrame.CopySkeletonDataTo(skeletonData); //copy all the skeleton data into our skeleton array. It's an array that holds data for up to 6 skeletons
Skeleton skeletonOfInterest = (from s in skeletonData
where s.TrackingState == SkeletonTrackingState.Tracked
select s).ElementAtOrDefault(1); //define the first skeleton thats tracked
//put code to manipulate skeletons. You have to go do some reading to find out how to work with the skeletons.
}
}
MSDN is typically my goto resource for learning about how to work the Kinect. If you installed the Kinect SDK, there are also some good samples in the Developer Toolkit Browser. Finally, another good resource is Beginning Kinect Programming with the Microsoft Kinect SDK by Apress, which I've relied on extensively. You can find it on Amazon.

Related

Hide an object for a specific camera

I use godot to create my 3d game. I ran into a problem while creating portals using camera viewport rendering to texture. The problem is that the camera captures unnecessary objects that are behind portal. I partially solved this problem by setting the parameter "near " for the camera at a distance from the camera itself to the portal, but the part behind the portal began to be cut off.
The question is, is it possible to hide objects for a particular camera so that other cameras can see them? Perhaps there is another way to do this, for example by creating a static clipping plane?
Proximity Fade
Probably not what you are looking for, but I'll mention it for completeness sake.
The default material has proximity fade and distance fade, which you can use to make the material disappear if it is too close or to distant from the camera, respectively.
It is important to note that this is not a cull plane, and that the fading is gradual.
Thus, using proximity fade you can make objects near the camera appear semitransparent.
Using Visibility layers and cull mask
is it possible to hide objects for a particular camera so that other cameras can see them?
Every VisualInstance (you know, all things that are visible in 3D) has layers. And every Camera has a cull_mask. If the cull_mask of the Camera does not include any of the layers of a VisualInstance, then the Camera does not see that VisualInstance.
A VisualInstance with no layers will not show on no Camera, even if the Camera has all the layers in its cull_mask (which is the default).
You can either edit the cull_mask of the camera to not include the layers of the VisualInstance, or edit the layers of the VisualInstance, or both.
Using a custom shader cull plane
Perhaps there is another way to do this, for example by creating a static clipping plane?
You can use a custom spatial shader to cut things out based on a plane.
You need to define the plane as a uniforms. For this answer I'll use a point-normal definition of a plane:
n·(r - r_0)
That is:
dot(plane_normal, (world_position - plane_point)
Thus, we define a plane_normal and plane_point uniforms:
uniform vec3 plane_normal;
uniform vec3 plane_point;
The plane_normal gives us the orientation of the plane, while the plane_point is a point on the plane which allows us to position it.
And then use this logic:
vec3 wold_position = (CAMERA_MATRIX * vec4(VERTEX, 1.0)).xyz;
ALPHA = clamp(sign(dot(plane_normal, wold_position - plane_point)), 0.0, 1.0);
Here we are converting the coordinates of the current point to world space, and then using definition of the plane to find the points on one side (using sign), and set ALPHA based on that, such that everything on one side of the plane becomes invisible.
Note: This is not the only way to define the plane. Another popular definition is a 4D vector, where the xyz are the normal, and the w is the distance from plane to the origin.
Sadly, I don't think there is a way to make this work with multiple material passes, because ALPHA controls the blending of the passes, and will not result in transparency. And no, using discard; does not solve it either, because the other passes can write the fragment regardless. Thus, you are going to need to modify your materials to include that.
Further Sadly Godot 3.x does not support global uniforms (see Godot 4.0 gets global and per-instance shader uniforms). Which means you will have to set these parameter everywhere you need them.
Using Constructive Solid Geometry (CSG)
Add a CSGCombiner make the geometry that needs to disappear with other CSG nodes as children.
Then you can, for example, add a CSGSphere with operation set to "Subtraction", and move it with the Camera (for this purpose, I suggest to add a RemoteTransform node as child to the Camera and set its remote path to the CSGSphere).
Of course, it does not have to be a CSGSphere, you can use any CSG nodes for this purpose. For the portal, I imagine you could use a CSGBox and align it to the portal plane.
Note: Currently on Godot 3.3 CSG nodes do not support baking lights. This is a regression. See: Unable to bake lightmap with CSG due to the lack of ability to generate UV2 for CSG nodes.
Portals, actually
Bartleby Lawnjelly has a portal (godot-lportal) module for Godot 3.x.
Being a module, they require to build Godot from source. See Compiling on the official Godot documentation. It is not that bad, I promise. Or use build from godot-titan.
I have to explain that these portals are not portals in the Valve Portal video game series sense… The module lets you define areas as "rooms", and planes as "portals" that connect those rooms, in such way that you can look from one to the other. The purpose of this is to cull entire rooms unless you are looking through one of the portals.
Hopefully that makes more sense with a video. This is a somewhat old one, but good to get the idea across: Portal rendering module in Godot 3.2 - Improved performance. Seeing shadow pooping in the video? Bartleby Lawnjelly also has a custom lightmapper.

First-person game in OpenGL ES 2.0

I know I need to use a frustum projection for a first-person game I'm writing. However, I'm not sure what the most efficient way to move around in the world is.
Currently I'm using
Matrix.setLookAtM(mVMatrix, 0, eyex, eyey, eyez, lookx, looky, lookz, upx, upy, upz);
Matrix.multiplyMM(mMVPMatrix, 0, mProjMatrix, 0, mVMatrix, 0);
every time the display is redrawn. User input changes the "eye", "look" position vectors, and the "up" direction vector.
However, I've read elsewhere that one should tranlate/rotate the world and not the "camera".
My question is: should I rotate the objects about a fixed "camera" (i.e. only use setLookAtM once at set up) or should I carry on using my current method?
There is no such a thing called as camera in OpenGL. In fact, what you describe by moving the objects instead of the viewpoint is what Opengl actually accomplishes internally to give you the feeling that camera moves. There is a great article along with a tutorial that describes all about cameras on OpenGL ES 2 and basically it is the same logic on other versions:
http://db-in.com/blog/2011/04/cameras-on-opengl-es-2-x/
It might seem boring and confusing but I have been searching about cameras on OpenGL ES 2 a lot and this article is almost perfect for a beginner.
To answer your question, your current method is exactly the same that you are describing. The camera is always fixed in OpenGL and what you do with matrix multiplications provides you the effect that you want already. When you multiply your model matrix with view matrix and projection matrix, in the end, you define a new position for your object depending on your camera parameters.

How to map kinect skeleton data to a model?

I have set up a Kinect device and written a simple program that reads the stream to a QImage using OpenNI 2.0. I have set up skeleton tracking with NiTE 2.0, so I have access to the coordinates of all the 15 joints. I have also set up a simple scene using SceniX. The hand coordinates provided by the skeleton tracking are beeing used to draw 2 boxes to represent the hands.
I would like to bind the whole skeleton to a (rigged)model, and cant seem to find any good tutorials. Anyone have any idea how I should proceed?
depending on your requirements you could look at something like this for Unity Engine https://www.assetstore.unity3d.com/en/#!/content/10693
There is also a Plugin for the Unreal 4 Engine called KINECT 4 UNREAL FROM OPAQUE MULTIMEDIA
But if you have to write it all by hand for yourself, i have done something similar using OpenGL.
I used Assimp http://assimp.sourceforge.net/ to be able to load animated Collada models and OpenNi with NiTE for skeletal tracking. I then used the rotation data from the Nite skeleton and applied it to the corresponding Bones of my rigged mesh, overwriting the rotation values of the animation. Don't use positional Data. It will strech your bones and distort the mesh.
There are many sources for free 3D Models, like TF3DM.com . I for myself used a custom Rig for my models to be suitable for my code. So you might look into using Blender and how to Rig a Model.
Also remember that the Nite Skeleton has no joint for the Pelvis, and that Nite joints don't inherit their parents rotation, contrary to the bones in a rigged model.
I hope this helps to have something to go on.
You can try DigitalRune, they have examples of binding a rigged model to joints. They have mentioned some examples too. try http://www.digitalrune.com/Support/Blog/tabid/719/EntryId/155/Research-Augmented-Reality-with-Microsoft-Kinect.aspx
Also you would need to know to animate model in blender and export it to XNA or to your working graphics framework. Eg:-http://www.codeproject.com/Articles/230540/Animating-single-bones-in-a-Blender-3D-model-with#SkinningSampleProject132

simple movement tracking (& saving of coordinates) with Kinect

I'm looking for a simple Kinect app which allows me to a) detect and b) track a single moving object in an otherwise static background.
I don't need any fancy skeleton or other features, just the center of mass of the moving object will do it.
Any pointers?
I would see Comparing a saved movement with other movement with Kinect to track the entire body. The answer shows the code here which shows how to save skeleton data. And mapping an ellipse to a joint in kinect sdk 1.5 to have the tracking of joints if you want to track the joints not the entire body (currently works better, but when the tracking the entire body works, use that because it is more effective and efficient).
your case is pretty simple, but requires initialization for the object since in general a term "object" is ill-defined. It can be a closest object or moving object or even the object that was touched, has certain color, size or shape.
Let's assume that you define object by motion that is whatever moves in your point cloud is an object. I suggest to do this:
Object detection is easy if object moves more than its size since
then you just may subtract depth maps and end up with your object:
depth1-depth2 > T but if the object moves slowly and shifts only by a
fraction of its size you have to use whatever high frequency info you
have, which can be depth or colour or both. It is going to be noisy as the figure below shows
as soon as you have your object selected you may want to clean it by running some morphological filters (erode +
dilate) to erase noise and get a single blob. After that you just
need to find some features in the blob such as average depth or mean
color and look for them in a small window around the object's previous
location in order to rediscover the object;
finally don't forget to update these features as object moves
through.
Some other ideas you may want to use are: depth gradient, connected components in depth, pre-recording background depth for cleaner subtraction, running grabCut on depth area selected by mouse click, etc.

How do I rotate an OpenGL view relative to the center of the view as opposed to the center of the object being displayed?

I'm working on a fork of Pleasant3D.
When rotating an object being displayed the object always rotates around the same point relative to to itself even if that point is not at the center of the view (e.g. because the user has panned to move the object in the view).
I would like to change this so that the view always rotates the object around the point at the center of the view as it appears to the user instead of the center of the object.
Here is the core of the current code that rotates the object around its center (slightly simplified) (from here):
glLoadIdentity();
// midPlatform is the offset to reach the "middle" of the object (or more specifically the platform on which the object sits) in the x/y dimension.
// This the point around which the view is currently rotated.
Vector3 *midPlatform = [self.currentMachine calcMidBuildPlatform];
glTranslatef((GLfloat)cameraTranslateX - midPlatform.x,
(GLfloat)cameraTranslateY - midPlatform.y,
(GLfloat)cameraOffset);
// trackBallRotation and worldRotation come from trackball.h/c which appears to be
// from an Apple OpenGL sample.
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(midPlatform.x, midPlatform.y, 0.);
// Now draw object...
What transformations do I need to apply in what order to get the effect I desire?
Some of what I've tried so far
As I understand it this is what the current code does:
"OpenGL performs matrices multiplications in reverse order if multiple transforms are applied to a vertex" (from here). This means that the first transformation to be applied is actually the last one in the code above. It moves the center of the view (0,0) to the center of the object.
This point is then used as the center of rotation for the next two transformations (the rotations).
Finally the midPlatform translation is done in reverse to move the center back to the original location and the XY translations (panning) done by the user is applied. Here also the "camera" is moved away from the object to the proper location (indicated by cameraOffset).
This seems straightforward enough. So what I need to change is instead of translating the center of the view to the center of the object (midPlatform) I need to translate it to the current center of the view as seen by the user, right?
Unfortunately this is where the transformations start affecting each other in interesting ways and I am running into trouble.
I tried changing the code to this:
glLoadIdentity();
glTranslatef(0,
0,
(GLfloat)cameraOffset);
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(cameraTranslateX, cameraTranslateY, 0.);
In other words, I translate the center of the view to the previous center, rotate around that, and then apply the camera offset to move the camera away to the proper position. This makes the rotation behave exactly the way I want it to, but it introduces a new issue. Now any panning done by the user is relative to the object. For example if the object is rotated so that the camera is looking along the X axis end-on, if the user pans left to right the object appears to be moving closer/further from the user instead of left or right.
I think I can understand why the is (XY camera translations being applied before rotation), and I think what I need to do is figure out a way to cancel out the translation from before the rotation after the rotation (to avoid the weird panning effect) and then to do another translation which translates relative to the viewer (eye coordinate space) instead of the object (object coordinate space) but I'm not sure exactly how to do this.
I found what I think are some clues in the OpenGL FAQ(http://www.opengl.org/resources/faq/technical/transformations.htm), for example:
9.070 How do I transform my objects around a fixed coordinate system rather than the object's local coordinate system?
If you rotate an object around its Y-axis, you'll find that the X- and Z-axes rotate with the object. A subsequent rotation around one of these axes rotates around the newly transformed axis and not the original axis. It's often desirable to perform transformations in a fixed coordinate system rather than the object’s local coordinate system.
The root cause of the problem is that OpenGL matrix operations postmultiply onto the matrix stack, thus causing transformations to occur in object space. To affect screen space transformations, you need to premultiply. OpenGL doesn't provide a mode switch for the order of matrix multiplication, so you need to premultiply by hand. An application might implement this by retrieving the current matrix after each frame. The application multiplies new transformations for the next frame on top of an identity matrix and multiplies the accumulated current transformations (from the last frame) onto those transformations using glMultMatrix().
You need to be aware that retrieving the ModelView matrix once per frame might have a detrimental impact on your application’s performance. However, you need to benchmark this operation, because the performance will vary from one implementation to the next.
And
9.120 How do I find the coordinates of a vertex transformed only by the ModelView matrix?
It's often useful to obtain the eye coordinate space value of a vertex (i.e., the object space vertex transformed by the ModelView matrix). You can obtain this by retrieving the current ModelView matrix and performing simple vector / matrix multiplication.
But I'm not sure how to apply these in my situation.
You need to transform/translate "center of view" point into origin, rotate, then invert that translation, back to the object's transform. This is known as a basis change in linear algebra.
This is way easier to work with if you have a proper 3d-math library (I'm assuming you do have one), and that also helps to to stay far from the deprecated fixed-pipeline APIs. (more on that later).
Here's how I'd do it:
Find the transform for the center of view point in world coordinates (figure it out, then draw it to make sure it's correct, with x,y,z axis too, since the axii are supposed to be correct w.r.t. the view). If you use the center-of-view point and the rotation (usually the inverse of the camera's rotation), this will be a transform from world origin to the view center. Store this in a 4x4 matrix transform.
Apply the inverse of the above transform, so that it becomes the origin. glMultMatrixfv(center_of_view_tf.inverse());
Rotate about this point however you want (glRotate())
Transform everything back to world space (glMultMatrixfv(center_of_view_tf);)
Apply object's own world transform (glTranslate/glRotate or glMultMatrix) and draw it.
About the fixed function pipeline
Back in the old days, there were separate transistors for transforming a vertex (or it's texture coordinates), computing where light was in relation to it applying lights (up to 8) and texturing fragments in many different ways. Simply, glEnable(), enabled fixed blocks of silicon to do some computation in the hardware graphics pipeline. As performance grew, die sized shrunk and people demanded more features, the amount of dedicated silicon grew too, and much of it wasn't used.
Eventually, it got so advanced that you could program it in rather obscene ways (register combiners anyone). And then, it became feasible to actually upload a small assembler program for all vertex-level transforms. Then, it made to sense to keep a lot of silicon there that just did one thing (especially as you could've used those transistors to make the programmable stuff faster), so everything became programmable. If "fixed function" rendering was called for, the driver just converted the state (X lights, texture projections, etc) to shader code and uploaded that as a vertex shader.
So, currently, where even the fragment processing is programmable, there is just a lot of fixed-function options that is used by tons and tons of OpenGL applications, but the silicon on the GPU just runs shaders (and lots of it, in parallell).
...
To make OpenGL more efficient, and the drivers less bulky, and the hardware simpler and useable on mobile/console devices and to take full advantage of the programmable hardware that OpenGL runs on these days, many functions in the API are now marked deprecated. They are not available on OpenGL ES 2.0 and beyond (mobile) and you won't be getting the best performance out of them even on desktop systems (where they will still be in the driver for ages to come, serving equally ancient code bases originating back to the dawn of accelerated 3d graphics)
The fixed-functionness mostly concerns how transforms/lighting/texturing etc. are done by "default" in OpenGL (i.e. glEnable(GL_LIGHTING)), instead of you specifying these ops in your custom shaders.
In the new, programmable, OpenGL, transform matrices are just uniforms in the shader. Any rotate/translate/mult/inverse (like the above) should be done by client code (your code) before being uploaded to OpenGL. (Using only glLoadMatrix is one way to start thinking about it, but instead of using gl_ModelViewProjectionMatrix and the ilk in your shader, use your own uniforms.)
It's a bit of a bother, since you have to implement quite a bit of what was done by the GL driver before, but if you have your own object list/graph with transforms and a transform somewhere etc, it's not that much work. (OTOH, if you have a lot of glTranslate/glRotate in your code, it might be...). As I said, a good 3d-math library is indispensable here.
-..
So, to change the above code to "programmable pipeline" style, you'd just do all these matrix multiplications in your own code (instead of the GL driver doing it, still on the CPU) and then send the resulting matrix to opengl as a uniform before you activate the shaders and draw your object from VBOs.
(Note that modern cards do not have fixed-function code, just a lot of code in the driver to compile fixed-function rendering state to a shader that does the job. No wonder "classic" GL drivers are huge...)
...
Some info about this process is available at Tom's Hardware Guide and probably Google too.