minimum working distance to track moving object with kinect - kinect

I'm wondering what the minimum working distance for the Kinect is.
I'd like to track a moving object (10cm x 10cm) from a distance of 1m. The area that the object will be moving in, is 120cm x 60cm.
Given Kinect's specs, will it be possible to track the object across the entire area?

wikipedia says:
The sensor has an angular field of view of 57° horizontally and 43°
vertically
so the answer to your question would be: no, 120cm would be too wide. Since the maximum horizontal viewing field at 1m would be tan(57deg/2)*2 = ~1.08m
though at a distance of ~1.2m it should work (tan(57deg/2)*2*1.2 = ~1.3m)

Distance range handled by the sensor is 850mm minimum and 4000mm maximum, so this should be possible.
I strongly recommend watching the Kinect SDK Quickstarts videos as it covers all basics to get you started. In fact the "Working with depth data" probably contains exactly the kind of info you're looking for.

Related

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!
Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.
Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

Transformation between point clouds

I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!
In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.
I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.

Indoor positioning

I am trying to get indoor gps by trying to orient my floorplan with the actual building from google maps. I know perfect accuracy is not possible. Any idea how to do this ? Do the maps need to be converted to kml format?
Forget that!
Only with luck you can get indoor GPS signals, probably only near the window, and then it is likely to be more distorted than the size of your building.
You only can try to get the coordinates outside, at the corner of the buildings.
For precise measures you would need some averaging of the measures, which only a few GPS devices offer. For less precision, take the coordinate, or measure it on differnet hours, days.
Otherwise, you should think about geolocation using Wifi/HF and any other wireless/radio sources that you can precisely locate since you probably install it yourself or at least someone from your company/service is responsible of them and could give you the complete list with coordinates. Then, once you've got the radio location, you can geolocate the devices using radio propagation and location.
I know that's not the answer you were looking for, but think about it as an alternate one if you really need to locate people inside your building.
PS: I did it at work and it works pretty well (except some areas where radio emitter are broken).

Calculate user parameters using Microsoft kinect

I want to get the following information of a user that is captured using a Microsoft Kinect using a WPF application.
Shoulder width
Height
Waist width
Hip width
Arm length
Bust size
I couldn't find any standard way of doing this except calculating the x,y co-ordinates of the user. Is there any very efficient and accurate way of doing this?
You can follow the article # http://www.codeproject.com/Articles/380152/Kinect-for-Windows-Find-user-height-accurately
The easiest way to accomplish this task is using the Pythagorean theorem to compute the distance between two skeleton joints.
To get the shoulder width, you would use the joints JointType.ShoulderLeft and JointType.ShoulderRight. The get the length of the left arm, you would add the distance between JointType.ShoulderLeft and JointType.ElbowLeft to the distance between JointType.ElbowLeft and JointType.WristLeft.
Please note that the joint names above are from the Kinect for Windows SDK. On its own, OpenKinect does not provide a method for skeleton tracking, since it's specialized on accessing the device only. A popular alternative to the Kinect for Windows SDK is OpenNI.

Algorithm for reducing GPS track data to discard redundant data?

We're building a GIS interface to display GPS track data, e.g. imagine the raw data set from a guy wandering around a neighborhood on a bike for an hour. A set of data like this with perhaps a new point recorded every 5 seconds, will be large and displaying it in a browser or a handheld device will be challenging. Also, displaying every single point is usually not necessary since a user can't visually resolve that much data anyway.
So for performance reasons we are looking for algorithms that are good at 'reducing' data like this so that the number of points being displayed is reduced significantly but in such a way that it doesn't risk data mis-interpretation. For example, if our fictional bike rider stops for a drink, we certainly don't want to draw 100 lat/lon points in a cluster around the 7-Eleven.
We are aware of clustering, which is good for when looking at a bunch of disconnected points, however what we need is something that applies to tracks as described above. Thanks.
A more scientific and perhaps more math heavy solution is to use the Ramer-Douglas-Peucker algorithm to generalize your path. I used it when I studied for my Master of Surveying so it's a proven thing. :-)
Giving your path and the minimum angle you can tolerate in your path, it simplifies the path by reducing the number of points.
Typically the best way of doing that is:
Determine the minimum number of screen pixels you want between GPS points displayed.
Determine the distance represented by each pixel in the current zoom level.
Multiply answer 1 by answer 2 to get the minimum distance between coordinates you want to display.
starting from the first coordinate in the journey path, read each next coordinate until you've reached the required minimum distance from the current point. Repeat.