Kinect joint detection from top - kinect

I'm wondering, does the Kinect detects joints correctly when it's put on the top (on the ceiling).
I don't have necessary equipment to attach it to ceiling and test, but was wondering whether it reliably detects human. I'm ok even if it confuses the joints, actually.
Has anybody tested this?

From what I've seen while using it, the skeleton detection is iffy from any angle other than directly pointing at a person's front or back. A Kinect pointed straight down with people walking under it would almost certainly not detect anyone, because the human form from above does not look anything like it does from the front. I have had the Kinect pick up random people around me in odd positions (sitting, viewed from the side, etc), but the joints were largely spastic. If you have it mounted on the ceiling and pointed downwards at a sufficient angle to still see people from the front instead of from above.. it could do a fairly good job of picking them up.
So when you say on the ceiling do you mean pointing straight down or still looking at a fairly horizontal angle?

I did a little bit of testing with the Kinect mounted in a very high position (2.5 m, 70° to the ground). As answered by Coeffect it just doesn't work. It doesn't work with Microsoft SDK nor with OpenNI. What I can add is that the skeleton recognition only works if the user is facing the camera with her/his whole body-front. Even worse, both frameworks seem to expect the head at the top of the depth-frame.

Related

3D Objects are not being in their regular shape at distance

I am working on a game which was developed by some other guy earlier. I am facing a problem that when player(with camera) start running on the road the buildings are not being shown up in their regular shape and as we move forward (more closer to the buildings) they gain their original shapes, and some times the buildings present on either side of the road are not visible by camera ( empty space ) and when we move closer to the building it comes up as visible object suddenly. I think it may be some unity3d setting problem (rendering , camera or quality). May be, it was being done due to increase performance on mobile devices.
can anybody know what may be the issue or how to resolve it.
Any help will be appreciated. Thanks in advance
This sounds like it's a problem with the available LODs for each building's 3D model.
Basically, 3d games work by having 2-3 different versions of each 3D model, with varying *L*evels *O*f *D*etail. So for example, if you have a house model which uses 500 polygons, you'll probably have another 2 versions (eg 250 polys and 100 polys), which are used depending on the distance between the player and the object. The farther away he is, the simpler the version used will be.
The issue occurs when developers use automatically generated LOD models, which will look distorted or won't appear at all. Unity probably auto generates them, but I'm unsure where you'll find the settings for this in unity. However I've seen 3d models on the unity store offering models with different LODs, so unity probably gives you the option to set your own. The simplest solution would be to increase the distance the LODs change at, while the complicated solution would be to fix custom versions of the 3D models for larger distances, with a lower poly count.
I have resolved the problem. This was due to the LOD (level of details) used for objects (buildings) in Unity3d to enhance the performance on the slower device. LOD provides many level of details (of an object) which you can adjust according to your need . In my specific problem the buildings were suddenly appear due to the different (wrong) position for LOD1, i.e. for LOD1 the building was at wrong place but for LOD0 it was at its right place. So when my camera see from the distance it see LOD1 which was at wrong place thence it sees empty space with no building at the expected position. But when camera comes closer it sees LOD0 in which building is at the right position and it seems that buildings are suddenly come or become visible.

Separate noise from skeleton with Kinect

Looking to do a proof of concept, and new to Kinect. I believe this is possible, but trying to gauge difficulty with links to tutorials etc explaining how this may work.
Looking to have the Kinect look at a walkway, and essentially detect people movement. This does not mean Skeletal movement, but essentially "foot traffic". I want to determine the noise of traffic, i.e. are there alot of people walking past, or a few. (Note this does not mean counting, just a rough indication. Can this be just pixel movement etc?)
Secondly, if a person then stops and faces the Kinect, pick them up as a user, and track rudimentary movements.
The second part I'm relatively comfortable with, the first I'm not.
Any help is appreciated is pointing me in the right direction. We are a Microsoft house, so any indication if Microsoft SDK, or OpenKinect is the best path would be great too.

Shape (preferably human) recognition API for use with standard webcam

I am interested in getting into user interaction/shape detection with a simple usb webcam. I can use multiple webcams, but don't want to be restricted to using something like the kinect sensor. My detection cameras need to be set up on either side of a helmet (or if an individual one, on top). I have found some, but they don't really have the functionality I need and most are angled towards facial recognition. I need to be able to detect a basic human skeletal structure and determine if something is obstructing it. I would really rather be able to do it without using any sort of marker system on the target person. I would like for it to be able to target multiple structures. Obviously I am willing to do tweaking if necessary, but want to see how close I can get to what I need before I rebuild the wheel. I am trying to design an ai system that can determine how many people are in an area and where they are.
Doubt there will be anything like this since Microsoft spent a ton of money on the R&D for Kinect and it's probably all locked behind an NDA. I'm also guessing there's a lot of hardware within the Kinect that is not available in a standard webcam.
The closest thing that I could find to what you're looking for is the OpenKinect project, might be a good place to start your research.

Is the depth image returned by Microsoft Kinect SDK already undistorted?

Supposedly the Microsoft SDK has access to the Kinect's intrinsic parameters but does anyone have any idea if the depth image it returns is actually undistorted? I couldn't find anything relevant.
Let me know if I'm out of topic although I consider this as an implicit programming question :)
edit: some other useful links I found that support #Coeffect's answer
http://ros.org/wiki/kinect_calibration/technical
Accuracy Analysis of Kinect Depth Data, by K.Khoshelham
So the IR pattern that the Kinect displays isn't your normal grid. For an example, check out this blog post. The Kinect handles making a normal depth map out of this. Thinking about focal lengths and such for this system is just going to dig yourself a hole. Your thoughts about precision are probably misplaced. The Kinect isn't accurate enough to BE picky about such things. Having used the Kinect for motion detection, there's a lot of noise. If you have a certain situation in mind, you might want to post about it.
edit: Here's a post showing that the depth isn't linear, and more of the precision is focused on closer objects. So the farther away you are, the less precise the data is, and the more severe the noise will become (because having the returned depth change by 1 nearby is pretty much nothing, but farther away that accounts for a larger distance change).

Kinect - Techniques required to achieve the following display

Does anyone have any idea what technique I should use to make the display video shift left, right, up and down as in the video below? I want to achieve this with a Kinect but with a different idea.
Thanks in advance.
http://www.youtube.com/watch?v=V2hxaijuZ6w
EDIT:
Now that I'm awake, I'll go into better detail about this (it apparently took me a week to wake up).
So the Winscape project connects a real and virtual world by giving windows from the real world into a virtual world. The way it does this is act like the real world is part of the virtual world, and then changes the display of the monitors (disguised to look like windows) to replicate the view a person should see if they existed in the virtual world.
Imagine your virtual world. It doesn't necessarily have an end to it, but there's a point where you stop trying to render stuff into it, so let's say the world in enclosed in a box that contains all the rendered elements. Now what Winscape does is make it appear that the virtual world actually exists in the real world, and that you can see it through the monitors.
First step is obviously to create your virtual world. For starters, I'd suggest just creating a literal box. Make each wall a difference color, or put color gradients on the walls. Make something simple. If you haven't already decided on a 3D framework to handle this, I'd suggest XNA. It's C#, which works with the Kinect SDK, and it's got a ton of tutorials online to help you. Once you've created your world, use XNA to place a camera inside the box and add some simple controls to rotate the camera. This will allow you to look around the box from the inside, to make sure the rendering is working as expected.
Once you've done that, you need to decide where to put your windows. These will be the viewpoints into your 3D scene. To demonstrate this concept, here's a picture I took from an XNA camera tutorial.
Note that, if you read the actual tutorial, they won't say the exact same thing as me because I'm just hijacking the picture to demonstrate my meaning. So, the (0,0,0) point is where your "eye" is. The pink rectangle would represent your window. Looking at the window, four lines are drawn from the eye to the corners of the pink window. These four lines are extended forward until they collide with the background, creating the green rectangle. This would be the rectangle that your eye can see through the window.
Note that XNA will actually handle a LOT of this for you. You simply need to create a camera in your virtual scene and move it around, doing some math to aim it directly at your window. You'll want that camera to be in the virtual space in a way that represent your location in the real world. You can do this by using the Kinect to get your real world coordinates in relation to itself, then configure your application to know where your Kinect is in relation to your windows. Combing that data, you can get the location of your eyes in relation to your monitors in the real world, and since the monitors are represented by the windows in the virtual world, you can figure out where you exist in the virtual world. So place the virtual camera where your head is in the virtual world, point it at the windows, and do some magic to make sure only the window is viewed by the camera.
Original semi-lucid rant:
Okay, I'm going to take a shot at this (it's almost 1 AM, so let me know if I did a less than brilliant job and I'll come back to it when I wake up).
First, it'll involve quite a bit of math that I'm just going to skim over. You have, essentially, three layers.
Person ---- "Windows" (Monitors) ---- Scene
The scene, of course, doesn't really exist. You have to kind of incorporate the person into a virtual world where the scene, which is really just a flat image, exists behind a wall. The only way the person can see said scene is through the windows in the wall, which in reality is faked by monitors.
So, here comes the math. The Kinect can calculate where you're standing in the room, and more importantly, where your head is. From this you can get a general sense of where your eyes are. You'll need to take this point (your eyes) and translate it into the coordinates you're using in your virtual world. Then, calculate what those eyes should be able to see through the virtual windows. You can do this by projecting lines from the eyes to each corner of a window, all the way through until it hits the "scene" picture. Each window will correspond to a rectangular area on the background picture. This rectangle is what needs to be drawn to the screen.
The trickiest part is going to be setting up virtual world to nearly perfectly mimic the real world. Essentially, a lot of configuration ("okay, this window is 1.5 meters above the Kinect.. and .25 meters to its left.."). I'm also not sure how far back you should put the scene picture. If I think of something, I'll come back to this, but you can certainly just try it out and figure out a distance that works well for your set up.
Oh wait, now I know why I couldn't figure out the distance. It's because that example is using a 3D simulation. Pretty nifty. So you'd just need to figure out where you want to play your windows in the simulation or whatnot.
There are multiple techniques based on what setup you want to use (KinectDSK, libfreenect, OpenNI, etc.) and how accurate you want this to be.
OpenNI for example has a function called GetCoM which returns the centre of mass for a user (it doesn't need to track a skeleton at this point) which can be used. It looks like OpenNI was used in the video but they still use an old version. The newer version allows skeleton tracking without the 'psi'(ψ) pose.
Note that it doesn't look like it takes the user's head direction. The body could point in one direction and the head in another for example. G.Fanelli and his team have done quite a bit of research in the area. For Kinect check out Real Time Head Pose Estimation from Consumer Depth Cameras
I've played a bit with the KinectSDK and a Kinect for Windows and noticed there's a Face Tracker included.
In the end, based on to how loose or precise do you want the tracking to be, what's your ideal setup (maximum distance covered, content used, etc.) you can figure out what SDK/library will suit you best. Also, I imagine this also depends a bit on your experience with programming, in which case, also look for wrappers easier to tackle (e.g. Unity, MaxMSP/Jitter, Processing, openFrameworks, etc.)