simple movement tracking (& saving of coordinates) with Kinect - kinect

I'm looking for a simple Kinect app which allows me to a) detect and b) track a single moving object in an otherwise static background.
I don't need any fancy skeleton or other features, just the center of mass of the moving object will do it.
Any pointers?

I would see Comparing a saved movement with other movement with Kinect to track the entire body. The answer shows the code here which shows how to save skeleton data. And mapping an ellipse to a joint in kinect sdk 1.5 to have the tracking of joints if you want to track the joints not the entire body (currently works better, but when the tracking the entire body works, use that because it is more effective and efficient).

your case is pretty simple, but requires initialization for the object since in general a term "object" is ill-defined. It can be a closest object or moving object or even the object that was touched, has certain color, size or shape.
Let's assume that you define object by motion that is whatever moves in your point cloud is an object. I suggest to do this:
Object detection is easy if object moves more than its size since
then you just may subtract depth maps and end up with your object:
depth1-depth2 > T but if the object moves slowly and shifts only by a
fraction of its size you have to use whatever high frequency info you
have, which can be depth or colour or both. It is going to be noisy as the figure below shows
as soon as you have your object selected you may want to clean it by running some morphological filters (erode +
dilate) to erase noise and get a single blob. After that you just
need to find some features in the blob such as average depth or mean
color and look for them in a small window around the object's previous
location in order to rediscover the object;
finally don't forget to update these features as object moves
through.
Some other ideas you may want to use are: depth gradient, connected components in depth, pre-recording background depth for cleaner subtraction, running grabCut on depth area selected by mouse click, etc.

Related

GODOT: What is an efficient calculation for the AABB of a simple 3D model from a camera's view

I am attempting to come up with a quick and efficient means of translating a 3d mesh into a projected AABB. In the end, I would like to accomplish something similar to figure 1 wherein only the area of the screen covered by the cube is located inside the bounding box highlighted in red. ((if it is at all possible, getting the area as small as possible, highlighted in blue, would increase efficiency down the road.))
Figure 1. https://i.imgur.com/pd0E20C.png
Currently, I have tried:
Calculating the point position on the screen using camera.unproject_position(). this failed largely due to my inability to wrap my head around the pixel positions trending towards infinity. I understand it has something to do with Tan, but frankly, it is too late for my brain to function anymore.
Getting the area of collision between the view frustum and the AABB of the mesh instance. This method seems convoluted, and to get it in a usable format I would need to project the result into 2d coordinates again.
Using the MeshInstance VisualInstance to create a texture wherein a pixel is white if it contains the mesh instance, and black otherwise. Visual instances in general just baffle me, and I did not think it would be efficient to have another viewport just to output this texture.
What I am looking for:
An output that can be passed to a shader informing where to complete certain calculations. Right now this is set up to use a bounding box, but it could easily be rewritten to also use a texture. It also could be rewritten to use polygons, but I am trying to keep calculations to a minimum in the shader.
Certain solutions I have tried before have worked, slightly, but this must be robust. The camera interfacing with the 3d object will be able to move completely around and through it, meaning at times the view will be completely surrounded by the 3d model with points both in front, and behind.
Thank you for any help you can provide.
I will try my best to update this post with information if needed.

Detect multiple bodies in Kinect?

I am working with kinect in openframework using the ofxKinect addon, which is great and plenty fun!
Anyway I am looking for some pointers or a direction when dealing with multiple bodies on the screen. I was thinking of making a rect around each detected body and when the rects intersect something could happen, an effect or anything.
So what I am looking for are ideas or something that could point me to the right direction of detecting multiple bodies when using a kinect.
Right now based on the depth image I get from the kinect I go through each pixel and create a bunch of smaller rectangles with a padding and group them in a larger rectangle bound if they are separate from another rectangle group. This is not ideal as it only deals with the pixel values and is not really seperating bodies from eachother and is not giving me the results I am looking for.
So any ideas would be greatly appreciated!
If you want to use ofxKinect a quick solution would be to threshold on depth and assume bodies and no other objects will be within a depth range. This should make it easy to use the OpenCV's contour finder to isolate the outlines of the bodies and get the bounding rectangles. If the rectangles intersect(and ofRectangle already does the math you), trigger the reaction you need. Also don't forget to do that once if the effect isn't showing already, otherwise you will trigger the effect multiple times per second while the two bodies' bounding rectangles intersect.
You could try something a bit more hardcore and using ofxCv(not just ofxOpenCV) to tap into the HoG functionality. This is slow in itself and not ideal with the depth map, but hopefully you can run in every few seconds just to detect a person and the depth, then keep tracking that movement.
Personally, if you want to track people with the Kinect I recommend using ofxOpenNI as if already provides the scene segmentation feature and even if you don't track the skeletons you can still get useful information like the pixels pertaining to each body and they're centre of mass. I'm guessing Microsoft KinectSDK has a similar feature and there should be an oF addon, but it's windows only.
ofxKinect/libfreenect does not offer any people detection features, so you will need to roll your own.

How do I rotate an OpenGL view relative to the center of the view as opposed to the center of the object being displayed?

I'm working on a fork of Pleasant3D.
When rotating an object being displayed the object always rotates around the same point relative to to itself even if that point is not at the center of the view (e.g. because the user has panned to move the object in the view).
I would like to change this so that the view always rotates the object around the point at the center of the view as it appears to the user instead of the center of the object.
Here is the core of the current code that rotates the object around its center (slightly simplified) (from here):
glLoadIdentity();
// midPlatform is the offset to reach the "middle" of the object (or more specifically the platform on which the object sits) in the x/y dimension.
// This the point around which the view is currently rotated.
Vector3 *midPlatform = [self.currentMachine calcMidBuildPlatform];
glTranslatef((GLfloat)cameraTranslateX - midPlatform.x,
(GLfloat)cameraTranslateY - midPlatform.y,
(GLfloat)cameraOffset);
// trackBallRotation and worldRotation come from trackball.h/c which appears to be
// from an Apple OpenGL sample.
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(midPlatform.x, midPlatform.y, 0.);
// Now draw object...
What transformations do I need to apply in what order to get the effect I desire?
Some of what I've tried so far
As I understand it this is what the current code does:
"OpenGL performs matrices multiplications in reverse order if multiple transforms are applied to a vertex" (from here). This means that the first transformation to be applied is actually the last one in the code above. It moves the center of the view (0,0) to the center of the object.
This point is then used as the center of rotation for the next two transformations (the rotations).
Finally the midPlatform translation is done in reverse to move the center back to the original location and the XY translations (panning) done by the user is applied. Here also the "camera" is moved away from the object to the proper location (indicated by cameraOffset).
This seems straightforward enough. So what I need to change is instead of translating the center of the view to the center of the object (midPlatform) I need to translate it to the current center of the view as seen by the user, right?
Unfortunately this is where the transformations start affecting each other in interesting ways and I am running into trouble.
I tried changing the code to this:
glLoadIdentity();
glTranslatef(0,
0,
(GLfloat)cameraOffset);
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(cameraTranslateX, cameraTranslateY, 0.);
In other words, I translate the center of the view to the previous center, rotate around that, and then apply the camera offset to move the camera away to the proper position. This makes the rotation behave exactly the way I want it to, but it introduces a new issue. Now any panning done by the user is relative to the object. For example if the object is rotated so that the camera is looking along the X axis end-on, if the user pans left to right the object appears to be moving closer/further from the user instead of left or right.
I think I can understand why the is (XY camera translations being applied before rotation), and I think what I need to do is figure out a way to cancel out the translation from before the rotation after the rotation (to avoid the weird panning effect) and then to do another translation which translates relative to the viewer (eye coordinate space) instead of the object (object coordinate space) but I'm not sure exactly how to do this.
I found what I think are some clues in the OpenGL FAQ(http://www.opengl.org/resources/faq/technical/transformations.htm), for example:
9.070 How do I transform my objects around a fixed coordinate system rather than the object's local coordinate system?
If you rotate an object around its Y-axis, you'll find that the X- and Z-axes rotate with the object. A subsequent rotation around one of these axes rotates around the newly transformed axis and not the original axis. It's often desirable to perform transformations in a fixed coordinate system rather than the object’s local coordinate system.
The root cause of the problem is that OpenGL matrix operations postmultiply onto the matrix stack, thus causing transformations to occur in object space. To affect screen space transformations, you need to premultiply. OpenGL doesn't provide a mode switch for the order of matrix multiplication, so you need to premultiply by hand. An application might implement this by retrieving the current matrix after each frame. The application multiplies new transformations for the next frame on top of an identity matrix and multiplies the accumulated current transformations (from the last frame) onto those transformations using glMultMatrix().
You need to be aware that retrieving the ModelView matrix once per frame might have a detrimental impact on your application’s performance. However, you need to benchmark this operation, because the performance will vary from one implementation to the next.
And
9.120 How do I find the coordinates of a vertex transformed only by the ModelView matrix?
It's often useful to obtain the eye coordinate space value of a vertex (i.e., the object space vertex transformed by the ModelView matrix). You can obtain this by retrieving the current ModelView matrix and performing simple vector / matrix multiplication.
But I'm not sure how to apply these in my situation.
You need to transform/translate "center of view" point into origin, rotate, then invert that translation, back to the object's transform. This is known as a basis change in linear algebra.
This is way easier to work with if you have a proper 3d-math library (I'm assuming you do have one), and that also helps to to stay far from the deprecated fixed-pipeline APIs. (more on that later).
Here's how I'd do it:
Find the transform for the center of view point in world coordinates (figure it out, then draw it to make sure it's correct, with x,y,z axis too, since the axii are supposed to be correct w.r.t. the view). If you use the center-of-view point and the rotation (usually the inverse of the camera's rotation), this will be a transform from world origin to the view center. Store this in a 4x4 matrix transform.
Apply the inverse of the above transform, so that it becomes the origin. glMultMatrixfv(center_of_view_tf.inverse());
Rotate about this point however you want (glRotate())
Transform everything back to world space (glMultMatrixfv(center_of_view_tf);)
Apply object's own world transform (glTranslate/glRotate or glMultMatrix) and draw it.
About the fixed function pipeline
Back in the old days, there were separate transistors for transforming a vertex (or it's texture coordinates), computing where light was in relation to it applying lights (up to 8) and texturing fragments in many different ways. Simply, glEnable(), enabled fixed blocks of silicon to do some computation in the hardware graphics pipeline. As performance grew, die sized shrunk and people demanded more features, the amount of dedicated silicon grew too, and much of it wasn't used.
Eventually, it got so advanced that you could program it in rather obscene ways (register combiners anyone). And then, it became feasible to actually upload a small assembler program for all vertex-level transforms. Then, it made to sense to keep a lot of silicon there that just did one thing (especially as you could've used those transistors to make the programmable stuff faster), so everything became programmable. If "fixed function" rendering was called for, the driver just converted the state (X lights, texture projections, etc) to shader code and uploaded that as a vertex shader.
So, currently, where even the fragment processing is programmable, there is just a lot of fixed-function options that is used by tons and tons of OpenGL applications, but the silicon on the GPU just runs shaders (and lots of it, in parallell).
...
To make OpenGL more efficient, and the drivers less bulky, and the hardware simpler and useable on mobile/console devices and to take full advantage of the programmable hardware that OpenGL runs on these days, many functions in the API are now marked deprecated. They are not available on OpenGL ES 2.0 and beyond (mobile) and you won't be getting the best performance out of them even on desktop systems (where they will still be in the driver for ages to come, serving equally ancient code bases originating back to the dawn of accelerated 3d graphics)
The fixed-functionness mostly concerns how transforms/lighting/texturing etc. are done by "default" in OpenGL (i.e. glEnable(GL_LIGHTING)), instead of you specifying these ops in your custom shaders.
In the new, programmable, OpenGL, transform matrices are just uniforms in the shader. Any rotate/translate/mult/inverse (like the above) should be done by client code (your code) before being uploaded to OpenGL. (Using only glLoadMatrix is one way to start thinking about it, but instead of using gl_ModelViewProjectionMatrix and the ilk in your shader, use your own uniforms.)
It's a bit of a bother, since you have to implement quite a bit of what was done by the GL driver before, but if you have your own object list/graph with transforms and a transform somewhere etc, it's not that much work. (OTOH, if you have a lot of glTranslate/glRotate in your code, it might be...). As I said, a good 3d-math library is indispensable here.
-..
So, to change the above code to "programmable pipeline" style, you'd just do all these matrix multiplications in your own code (instead of the GL driver doing it, still on the CPU) and then send the resulting matrix to opengl as a uniform before you activate the shaders and draw your object from VBOs.
(Note that modern cards do not have fixed-function code, just a lot of code in the driver to compile fixed-function rendering state to a shader that does the job. No wonder "classic" GL drivers are huge...)
...
Some info about this process is available at Tom's Hardware Guide and probably Google too.

How to code a random movement in limited area

I have a limited area (screen) populated with a few moving objects (3-20 of them, so it's not like 10.000 :). Those objects should be moving with a constant speed and into random direction. But, there are a few limitation to it:
objects shouldn't exit the area - so if it's close to the edge, it should move away from it
objects shouldn't bump onto each other - so when one is close to another one it should move away (but not get too close to different one).
On the image below I have marked the allowed moves in this situation - for example object D shouldn't move straight up, as it would bring it to the "wall".
What I would like to have is a way to move them (one by one). Is there any simple way to achieve it, without too much calculations?
The density of objects in the area would be rather low.
There are a number of ways you might programmatically enforce your desired behavior, given that you have such a small number of objects. However, I'm going to suggest something slightly different.
What if you ran the whole thing as a physics simulation? For instance, you could set up a Box2D world with no gravity, no friction, and perfectly elastic collisions. You could model your enclosed region and populate it with objects that are proportionally larger than their on-screen counterparts so that the on-screen versions never get too close to each other (because the underlying objects in the physics simulation will collide and change direction before that can happen), and assign each object a random initial position and velocity.
Then all you have to do is step the physics simulation, and map its current state into your UI. All the tricky stuff is handled for you, and the result will probably be more believable/realistic than what you would get by trying to come up with your own movement algorithm (or if you wanted it to appear more random and less believable, you could also just periodically apply a random impulse to a random object to keep things changing unpredictably).
You can use the hitTest: method of UIView
UIView* touchedView=[self.superview hitTest:currentOrigin withEvent:nil];
In This method you have to pass the current origin of the ball and in second argument you can pass nil.
that method will return the view with which the ball is hited.
If there is any hit view you just change the direction of the ball.
for border you can set the condition for the frame of the ball if the ball go out of the boundary just change the direction of the ball.

Objective-C draw a path and detect when it closes (forms a closed shape)

I'm fairly new to game programming (but not to programming) and I want to create a space ship which leaves a trail on the screen. Now my problem is to come up with a solution how to detect if the trail left from the ship forms a closed shape - eg. if the ship left a trail around an object, the object is caught inside its trail so to speak.
The direction I'm thinking is to draw the path of the trail on an image not visible on the screen and every now and then try to fill it with certain color and then check if fill is caught within the trail path. However it seems like a lot of overhead.
Any ideas how to do that? I'm using cocos2d if that's of any help
In game programming you often need to think more mathematically than visually.
First does your ship continuously leaves a trail on the screen? If yes, then it will be easier to know when the shape closes : you just have to remember the coordinate where your ship started to leave a trail, then wait for the trail to approach this coordinate another time (for example within a radius of 10 pixels, or else the user will need to be really accurate to hit exactly the same pixel to close the shape).
The visual representation of the trail is only here for the user, you'll never use it to compute anything. What you will do is to keep in memory the path followed by the ship's trail : a polygon, which is nothing else than the list of coordinates it followed.
Then after you know that your shape is closed, you have to determine if an object is inside your polygon or not. It's possible that objective-c or cocos2d (I don't know much about it) already contains a built-in function to know if a point is inside a polygon. In java there is the Polygon class which makes this really easy. If you don't find anything you can do it yourself, there are already great answers about this subject on SO, here is a nice one : How can I determine whether a 2D Point is within a Polygon?