Kinect - Techniques required to achieve the following display - kinect

Does anyone have any idea what technique I should use to make the display video shift left, right, up and down as in the video below? I want to achieve this with a Kinect but with a different idea.
Thanks in advance.
http://www.youtube.com/watch?v=V2hxaijuZ6w

EDIT:
Now that I'm awake, I'll go into better detail about this (it apparently took me a week to wake up).
So the Winscape project connects a real and virtual world by giving windows from the real world into a virtual world. The way it does this is act like the real world is part of the virtual world, and then changes the display of the monitors (disguised to look like windows) to replicate the view a person should see if they existed in the virtual world.
Imagine your virtual world. It doesn't necessarily have an end to it, but there's a point where you stop trying to render stuff into it, so let's say the world in enclosed in a box that contains all the rendered elements. Now what Winscape does is make it appear that the virtual world actually exists in the real world, and that you can see it through the monitors.
First step is obviously to create your virtual world. For starters, I'd suggest just creating a literal box. Make each wall a difference color, or put color gradients on the walls. Make something simple. If you haven't already decided on a 3D framework to handle this, I'd suggest XNA. It's C#, which works with the Kinect SDK, and it's got a ton of tutorials online to help you. Once you've created your world, use XNA to place a camera inside the box and add some simple controls to rotate the camera. This will allow you to look around the box from the inside, to make sure the rendering is working as expected.
Once you've done that, you need to decide where to put your windows. These will be the viewpoints into your 3D scene. To demonstrate this concept, here's a picture I took from an XNA camera tutorial.
Note that, if you read the actual tutorial, they won't say the exact same thing as me because I'm just hijacking the picture to demonstrate my meaning. So, the (0,0,0) point is where your "eye" is. The pink rectangle would represent your window. Looking at the window, four lines are drawn from the eye to the corners of the pink window. These four lines are extended forward until they collide with the background, creating the green rectangle. This would be the rectangle that your eye can see through the window.
Note that XNA will actually handle a LOT of this for you. You simply need to create a camera in your virtual scene and move it around, doing some math to aim it directly at your window. You'll want that camera to be in the virtual space in a way that represent your location in the real world. You can do this by using the Kinect to get your real world coordinates in relation to itself, then configure your application to know where your Kinect is in relation to your windows. Combing that data, you can get the location of your eyes in relation to your monitors in the real world, and since the monitors are represented by the windows in the virtual world, you can figure out where you exist in the virtual world. So place the virtual camera where your head is in the virtual world, point it at the windows, and do some magic to make sure only the window is viewed by the camera.
Original semi-lucid rant:
Okay, I'm going to take a shot at this (it's almost 1 AM, so let me know if I did a less than brilliant job and I'll come back to it when I wake up).
First, it'll involve quite a bit of math that I'm just going to skim over. You have, essentially, three layers.
Person ---- "Windows" (Monitors) ---- Scene
The scene, of course, doesn't really exist. You have to kind of incorporate the person into a virtual world where the scene, which is really just a flat image, exists behind a wall. The only way the person can see said scene is through the windows in the wall, which in reality is faked by monitors.
So, here comes the math. The Kinect can calculate where you're standing in the room, and more importantly, where your head is. From this you can get a general sense of where your eyes are. You'll need to take this point (your eyes) and translate it into the coordinates you're using in your virtual world. Then, calculate what those eyes should be able to see through the virtual windows. You can do this by projecting lines from the eyes to each corner of a window, all the way through until it hits the "scene" picture. Each window will correspond to a rectangular area on the background picture. This rectangle is what needs to be drawn to the screen.
The trickiest part is going to be setting up virtual world to nearly perfectly mimic the real world. Essentially, a lot of configuration ("okay, this window is 1.5 meters above the Kinect.. and .25 meters to its left.."). I'm also not sure how far back you should put the scene picture. If I think of something, I'll come back to this, but you can certainly just try it out and figure out a distance that works well for your set up.
Oh wait, now I know why I couldn't figure out the distance. It's because that example is using a 3D simulation. Pretty nifty. So you'd just need to figure out where you want to play your windows in the simulation or whatnot.

There are multiple techniques based on what setup you want to use (KinectDSK, libfreenect, OpenNI, etc.) and how accurate you want this to be.
OpenNI for example has a function called GetCoM which returns the centre of mass for a user (it doesn't need to track a skeleton at this point) which can be used. It looks like OpenNI was used in the video but they still use an old version. The newer version allows skeleton tracking without the 'psi'(ψ) pose.
Note that it doesn't look like it takes the user's head direction. The body could point in one direction and the head in another for example. G.Fanelli and his team have done quite a bit of research in the area. For Kinect check out Real Time Head Pose Estimation from Consumer Depth Cameras
I've played a bit with the KinectSDK and a Kinect for Windows and noticed there's a Face Tracker included.
In the end, based on to how loose or precise do you want the tracking to be, what's your ideal setup (maximum distance covered, content used, etc.) you can figure out what SDK/library will suit you best. Also, I imagine this also depends a bit on your experience with programming, in which case, also look for wrappers easier to tackle (e.g. Unity, MaxMSP/Jitter, Processing, openFrameworks, etc.)

Related

Live streaming model of a person in a VR environment

Given that a user is static in a VR environment, which of the two camera types below would be better to create a more 'real' looking representation of an live-streamed presenter in the VR world?
1) Kinect (can measure depth)
2) Normal 2D camera such as a high end webcam (maybe something like the pointgrey Flea3) (software assisted 3D illusion from a static angle)
Would be grateful if anyone with any experience with the relevant technologies or fields would be able to help out!
Your question lacks the necessary information to provide a single correct answer. Is it your intent to provide a full 3D VR experience, or are you content with just 2D content? Is the presenter static, or are they moving around the viewer? Towards them? Away from them? Will you be using full spherical projection or something less complete, like cylindrical projection? And what sort of lighting do you think you'll need? These are all nontrivial questions, because the answers determine the best camera package to get your content.
You also fail to consider capturing with a 360º camera, which would be advantageous if the presenter is indeed moving around in the 360º space. My personal bias is towards capturing with these, but there's no single production solution unless you constrain the problem more thoroughly.

3D Objects are not being in their regular shape at distance

I am working on a game which was developed by some other guy earlier. I am facing a problem that when player(with camera) start running on the road the buildings are not being shown up in their regular shape and as we move forward (more closer to the buildings) they gain their original shapes, and some times the buildings present on either side of the road are not visible by camera ( empty space ) and when we move closer to the building it comes up as visible object suddenly. I think it may be some unity3d setting problem (rendering , camera or quality). May be, it was being done due to increase performance on mobile devices.
can anybody know what may be the issue or how to resolve it.
Any help will be appreciated. Thanks in advance
This sounds like it's a problem with the available LODs for each building's 3D model.
Basically, 3d games work by having 2-3 different versions of each 3D model, with varying *L*evels *O*f *D*etail. So for example, if you have a house model which uses 500 polygons, you'll probably have another 2 versions (eg 250 polys and 100 polys), which are used depending on the distance between the player and the object. The farther away he is, the simpler the version used will be.
The issue occurs when developers use automatically generated LOD models, which will look distorted or won't appear at all. Unity probably auto generates them, but I'm unsure where you'll find the settings for this in unity. However I've seen 3d models on the unity store offering models with different LODs, so unity probably gives you the option to set your own. The simplest solution would be to increase the distance the LODs change at, while the complicated solution would be to fix custom versions of the 3D models for larger distances, with a lower poly count.
I have resolved the problem. This was due to the LOD (level of details) used for objects (buildings) in Unity3d to enhance the performance on the slower device. LOD provides many level of details (of an object) which you can adjust according to your need . In my specific problem the buildings were suddenly appear due to the different (wrong) position for LOD1, i.e. for LOD1 the building was at wrong place but for LOD0 it was at its right place. So when my camera see from the distance it see LOD1 which was at wrong place thence it sees empty space with no building at the expected position. But when camera comes closer it sees LOD0 in which building is at the right position and it seems that buildings are suddenly come or become visible.

Shape (preferably human) recognition API for use with standard webcam

I am interested in getting into user interaction/shape detection with a simple usb webcam. I can use multiple webcams, but don't want to be restricted to using something like the kinect sensor. My detection cameras need to be set up on either side of a helmet (or if an individual one, on top). I have found some, but they don't really have the functionality I need and most are angled towards facial recognition. I need to be able to detect a basic human skeletal structure and determine if something is obstructing it. I would really rather be able to do it without using any sort of marker system on the target person. I would like for it to be able to target multiple structures. Obviously I am willing to do tweaking if necessary, but want to see how close I can get to what I need before I rebuild the wheel. I am trying to design an ai system that can determine how many people are in an area and where they are.
Doubt there will be anything like this since Microsoft spent a ton of money on the R&D for Kinect and it's probably all locked behind an NDA. I'm also guessing there's a lot of hardware within the Kinect that is not available in a standard webcam.
The closest thing that I could find to what you're looking for is the OpenKinect project, might be a good place to start your research.

Tutorials for controlling 3D modeling objects

I have some experience with Blender such that I can make a semitransparent cylinder of specified dimensions and small spheres. I want to (for a chemistry tutorial video explaining temperature and heat concepts) write a program that will:
Set up the cylinder and some spheres in a coordinate space
Set up a camera and lighting
Get the spheres moving around in random directions while keeping track of their positions and making them bounce when necessary (this I can figure out given a coordinate space; and I'm not going to get bone-crunchingly accurate trying to do accelerations, taking "mass" into account, etc. just going to send balls in another direction at the "speed" all the balls are going)
Record what this would look like through the camera for a set amount of time (thinking command line option in seconds)
In other words, by #4, this program doesn't even need to be GUI at all. I just want the program to make a video.
It may take me a very long time to actualize this because though I have a lot of experience with C, C++, and Java, I don't know how to take a 3D model file and programmatically control it. I don't even know the infrastructure of libraries and accompanying API to control 3D objects and record the camera to a file.
Are there any tutorials that would go from starting with some 3D models to programmatically setting up a scene (objects, camera, lights), programmatically moving the objects in the coordinate space, and recording the video to a file?
Knowing some programming already, I want to point you to Unity, www.unity3d.com
Unity is a 3d game engine, though it can be used for a number of different things, including this program you have in mind.
It's programmed with C# or Javascript, and I think you could pick these languages up easily enough.
Basically what you described in your last paragraph is exactly what Unity does.

Kinect joint detection from top

I'm wondering, does the Kinect detects joints correctly when it's put on the top (on the ceiling).
I don't have necessary equipment to attach it to ceiling and test, but was wondering whether it reliably detects human. I'm ok even if it confuses the joints, actually.
Has anybody tested this?
From what I've seen while using it, the skeleton detection is iffy from any angle other than directly pointing at a person's front or back. A Kinect pointed straight down with people walking under it would almost certainly not detect anyone, because the human form from above does not look anything like it does from the front. I have had the Kinect pick up random people around me in odd positions (sitting, viewed from the side, etc), but the joints were largely spastic. If you have it mounted on the ceiling and pointed downwards at a sufficient angle to still see people from the front instead of from above.. it could do a fairly good job of picking them up.
So when you say on the ceiling do you mean pointing straight down or still looking at a fairly horizontal angle?
I did a little bit of testing with the Kinect mounted in a very high position (2.5 m, 70° to the ground). As answered by Coeffect it just doesn't work. It doesn't work with Microsoft SDK nor with OpenNI. What I can add is that the skeleton recognition only works if the user is facing the camera with her/his whole body-front. Even worse, both frameworks seem to expect the head at the top of the depth-frame.