Precision of the kinect depth camera - kinect

How precise is the depth camera in the kinect?
range?
resolution?
noise?
Especially I'd like to know:
Are there any official specs about it from Microsoft?
Are there any scientific papers on the subject?
Investigations from TechBlogs?
Personal experiments that are easy to reproduce?
I'm collecting data for about a day now, but most of the writers don't name their sources and the values seem quite to differ...

Range: ~ 50 cm to 5 m. Can get closer (~ 40 cm) in parts, but can't have the full view be < 50 cm.
Horizontal Resolution: 640 x 480 and 45 degrees vertical FOV and 58 degrees horizontal FOV. Simple geometry shows is about ~ 0.75 mm per pixel x by y at 50 cm, and ~ 3 mm per pixel x by y at 2 m.
Depth resolution: ~ 1.5 mm at 50 cm. About 5 cm at 5 m.
Noise: About +-1 DN at all depths, but DN to depth is non-linear. This means +-1 mm close, and +- 5 cm far.
There are official specs from the sensor developer, not from Microsoft. No scientific papers that I know of yet. Plenty of investigations and experiments (see Google). The OpenKinect has a lot more discussion on these things than this site for now.

The Kinect for Windows SDK provide some constants which I've been using and seem to be consistent. For range and resolution, these values are:
In default mode:
Minimum range: 80 cm
Maximum range: 400 cm
In near mode:
Minimum range: 40 cm
Maximum range: 300 cm
For the color camera, you may have either of the following resolutions:
80x60
320x240
640x480
1280x960
For the depth camera, you may have either of the following resolutions:
80x60
320x240
640x480
Confronting the information from Avada Kedavra (and from most sources, by the way), the values for the field of view given by the API are the following:
For the color camera:
Horizontal FOV: 62,0°
Vertical FOV: 48,6°
For the depth camera:
Horizontal FOV: 58,5°
Vertical FOV: 45,6°
Source: http://msdn.microsoft.com/en-us/library/hh855368

The real question here was about resolution and precision. I care to chip in here as i find the resolution and precision to be not as good as stated. The maximum output of the depth resolution is indeed 640x480, however, this is not the effective resolution, and this is not exactly how precise it is.
The method in which the kinect works is based on structured light projection. A pattern of light is emitted and cast on the surface, which a camera sees and then triangulates each ray from the origin, bounced off the object, to the camera.
The thing is that this pattern consists out of only 34.749 bright spots that can be triangulated (http://azttm.wordpress.com/2011/04/03/kinect-pattern-uncovered/). If we relate this to a resolution of 640x480=307.200 data points, we notice a great difference. Ask yourself if the amount of data 10 times the amount of source-data-points can be seen as valid, and sampled efficiently. I doubt it. If you were to ask me what the effective resolution of the kinect is, i would guess it is around 240x180 of honest and pretty good data.

According to Kinect tech spec finally revealed the specs for the depth field are (these match is also confirmed in the official programming guide as posted by Mannimarco):
* Horizontal field of view: 57 degrees
* Vertical field of view: 43 degrees
* Physical tilt range: ± 27 degrees
* Depth sensor range: 1.2m - 3.5m
* Resolution depth stream: 320x240 pixels
* Resolution color stream: 640x480 pixels
But from my own experience the depth sensor range is more like 0.8m-4.0m, at least I get good reading in this range. This range matches the Primesense data sheet posted by mankoff in the comments below.
It is also important to remember that the depth resolution is much higher close to the sensor than further away. At 3-4 meter the resolution is not nearly as good as at 1.5m. This becomes important if you, for example, want to calculate the normals of the surface. The result will be better closer to the sensor than further away.
Its not to hard to test the range yourself. The Official SDK (currently beta) will give you a a zero (0) depth when you are out of range. So, you could test this with a simple ruler, and test at what distance you get/dont get any reading larger than zero. I do not know how the OpenKinect SDK handles out-of-range readings.
A comment about noise: I would say that there is quite a bit of noise in the depth stream which makes it harder to work with. For example if you calculate the surface normals you can expect them to be a bit "jumpy" which of course will have a negative impact on fake lighting etc. Furthermore you have a parallax issue in the depth stream due to the distance between the IR transmitter and the receiver. This can also be hard to work with as it leave a large "shadow" in the depth data. This youtube video demonstrates the problem and discuss a way to resolve the issue using shaders. Its a video worth watching.

I think it might be worth mentioning the paper of Khoshelham and Elbernik who did propose a theoretical random error model of the kinects depth sensor in Feb '12. It is called "Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications".
The paper can be found here.

If you're looking for something published by Microsoft, check out page 11 of the Kinect Programming Guide. It says pretty much the same thing everyone here has already mentioned.
Range: 1.2 to 3.5 meters
Viewing angle: 43° vertical by 57° horizontal
Mechanized tilt range: ±28°
Frame rate: 30 frames per second
Resolution, depth stream: 320 x 240 (it can actually go higher than this)
Resolution, color stream: 640 x 480 (again, it can go higher)
I don't see anything mentioning noise, but I can say it's pretty minimal except along surface edges where it can become more noticeable.

My experience is that it is not that exact. It's pretty OK, but when you compare it to tape measure, then it is not exactly matching. I made an Excel with measurements for every 10mm, and it just doesn't hold up, especially things that are more then 2500mm away, but more closer too.
Keep also in mind that the actual depth pixels is a lot lower then is advertised. The electronics inside fills in the gasps, that's why you see small area artifacts, and not something like pixel data. In essence this means that 320x240 has 1/8 pixels covered by a "real" measurement, other pixels are calculated. So you could use 640x480; but it would only CPU/UBS resource and will not make your application see better.
That's just my two cents of experience, I'm programming robotics.

Related

Calibration of magnetometer attached to a vehicle as Figure 8 calibration isn't possible in such scrnaroo

I was trying to find a way to calibrate a magnetometer attached to a vehicle as Figure 8 method of calibration is not really posible on vehicle.
Also removing magnetomer calibrating and fixing won't give exact results as fixing it back to vehicle introduces more hard iron distortion as it was calibrated without the vehicle environment.
My device also has a accelerometer and gps. Can I use accelerometer or gps data (this are calibrated) to automatically calibrate the magnetometer
Given that you are not happy with the results of off-vehicle calibration, I doubt that accelerometer and GPS data will help you a lot unless measured many times to average the noise (although technically it really depends on the precision of the sensors, so if you have 0.001% accelerometer you might get a very good data out of it and compensate inaccuracy of the GPS data).
From the question, I assume you want just a 2D data and you'll be using the Earth's magnetic field as a source (as otherwise, GPS wouldn't help). You might be better off renting a vehicle rotation stand for a day - it will have a steady well known angular velocity and you can record the magnetometer data for a long period of time (say for an hour, over 500 rotations or so) and then process it by averaging out any noise. Your vehicle will produce a different magnetic field while the engine is off, idle and running, so you might want to do three different experiments (or more, to deduce the engine RPM effect to the magnetic field it produces). Also, if the magnetometer is located close to the passengers, you will have additional influences from them and their devices. If rotation stand is not available (or not affordable), you can make a calibration experiment with the GPS (to use the accelerometers or not, will depend on their precision) as following:
find a large flat empty paved surface with no underground magnetic sources (walk around with your magnetometer to check) then put the
vehicle into a turn on this surface and fix the steering wheel use the cruise control to fix the speed
wait for couple of circles to ensure they are equal make a recording of 100 circles (or 500 to get better precision)
and then average the GPS noise out
You can do this on a different speed to get the engine magnetic field influence from it's RPM
I had performed a similar procedure to calibrate the optical sensor on the steering wheel to build the model of vehicle angular rotation from the steering wheel angle and current speed and that does not produce very accurate results due to the tire slipping differently on a different surface, but it should work okay for your problem.

Forward and backward movement detection with IMU

We have an embedded device mounted in a vehicle. It has accelerometer, gyrosopce and GPS sensors on board. The goal is to distinguish when vehicle is moving forward and backward (in reverse gear). Sensor's axis are aligned with vehicle's axis.
Here's our observations:
It's not enough to check direction of acceleration, because going backwards and braking while moving forward would show results in the same direction.
We could say that if GPS speed decreased 70 -> 60 km/h it was a braking event. But it becomes tricky when speed is < 20 km/h. Decrease 20 -> 10 km/h is possible when going both directions.
We can't rely on GPS angle at low speeds.
How could we approach this problem? Any ideas, articles or researches would be helpful.
You are looking for Attitude and heading reference system implementation. Here's an open source code library. It works by fusing the two data sources (IMU and GPS) to determine the location and the heading.
AHRS provides you with roll, pitch and yaw which are the angles around X, Y and Z axises of the IMU device.
There are different algorithms to do that. Examples of AHRS algorithms are Madgwick and Mahony algorithms. They will provide you with quaternion and Eurler angles which can easily help you identify the orientation of the vehicle at any time.
This is a cool video of AHRS algo running in real time.
Similar question is here.
EDIT
Without Mag data, you won't get high accuracy and your drift will increase over time.
Still, you can perform AHRS on 6DoF (Acc XYZ and Gyr XYZ) using Madgwick algorithm. You can find an implementation here. If you want to dive into the theory of things, have a look at Madgwick's internal report.
Kalman Filter could be an option to merge your IMU 6DoF with GPS data which could dramatically reduce the drift over time. But that requires a good understanding of Kalman Filters and probably custom implementation.

Kinect range capability

I'm thinking of trying something with the kinect but before I buy the equipment I had a simple feasibility question.
Would the kinect v2 be able to track hand movements in a very large room with lots of people?
The people would be sitting.
The room might be 10 times the size of a normal living room? Would it be possible to mount the kinect high to get maximum range?
Does it work in dark, but not pitch black, rooms?
Thanks!
My living room is about 5m by 5m. So you ask the question for 50m by 50m.
The Kinect v2 has a depth image resolution of 512 x 424 pixels with a fov of 70.6 x 60 degrees. So at a distance of 50m, 1 pixel has a size of 2*50m*tan(70.6/2)/512 = 14 cm. This is the size of my hand. The Kinect won't be able to track this depth info, you will just see noice and won't be able to filter the correct depth pixels which are hands.
A different way to check this is from the specs of the depth range of the Kinect v2. This gives a depth range of .5–4.5 meters. Within this range it can track a maximum of six people.
https://www.microsoft.com/en-us/kinectforwindows/meetkinect/features.aspx

What does horizontalAccuracy exactly mean?

I am working on an iOS application using location services. Having a background in experimental physics, I am wondering what exactly horizontalAccuracy in a location found in locationManager:didUpdateToLocation:fromLocation: stands for. The documentation is a bit sparse...
I assume that the accuracy gives a confidence interval based on a gaussian (or poisson?) distribution. Thus, with a certain probability, the actual position is within a circle with a radius of horizontalAccuracy, but could as well be outside that area. The question is then: how big is that probability? If horizontalAccuracy corresponds to 1σ, I'd have a probability of 68% to be within that circle with horizontalAccuracy, but looking the other way around, in nearly one third of the cases, the actual position will be outside that area. Thus, in certain cases, I'd rather use 2σ (2*horizontalAccuracy) or even 3σ (3*horizontalAccuracy) to calculate with.
To put it short: is there any indication somewhere, which confidence interval horizontalAccuracy has?
Comment to all who respond "Apple says it is within":
Well - the measurement can not be exact. It must have a certain level of uncertainty. If you repeat the measurement very often, you will get a distribution of results - probably a gaussian distribution. This gaussian has a certain width, which corresponds to the level of uncertainty of the measurements. Measuring the position more often will reduce the uncertainty and thus increase accuracy, but never will give you a distinct interval where the actual position is guaranteed to be in. You will only get a probability. But if the accuracy is 3sigma, we have 99,7% - which is close to certain.
To put it short - I doubt the documentation from Apple.
I have been looking for the same information and could not find any answers. The only pointer I have, is that on Android, they are using 1σ:
http://developer.android.com/reference/android/location/Location.html#getAccuracy%28%29
To all the non-believers, this link also explains a little bit how the accuracy thing works.
My guess is, the same is true on iOS, but there is no way to be sure - except for asking the guy who wrote the code ;)
Edit:
After some playing around and checking location updates vs. physical location it seems like it is more likely 3σ on iOS. There are two observations that lead me to believe that is true:
On Android locations that come from WiFi triangulation are usually reported as having an accuracy between 20 and 50 meters. On iOS it's between 65 and 165 meters.
When measuring the distance between a reported location and the device's physical location, it has been within the reported accuracy every time so far.
The iOS documentation doesn't specify the probability of containment, but android reports a one-sigma horizontal accuracy, which they define to represent 68% probability that the true location is within the circle.
Their explanation is that location errors follow a normal distribution, and therefore +/- one-sigma represents 68% probability. However, 68% is the probability for a one-dimensional normal distribution. In two dimensions, a one-sigma error represents 39% probability of containment within a circle (the distance error follows a Rayleigh distribution, a.k.a. a chi distribution with two degrees of freedom).
There are two possible explanations.
The circle truly represents 68% probability of containment, in which case android developers have scaled the one-dimensional sigma by a factor of about 1.5 so that the circle happens to represent 68%. In this case, their choice of 68% is completely arbitrary.
The circle actually represents 39% probability of containment. In this case, their description would be correct if you replaced a one-dimensional gaussian with a two-dimensional one and its associated probability.
I think the second explanation is more likely.
iOS: https://developer.apple.com/library/ios/documentation/CoreLocation/Reference/CLLocation_Class/index.html#//apple_ref/occ/instp/CLLocation/horizontalAccuracy
Android: http://developer.android.com/reference/android/location/Location.html#getAccuracy%28%29
Which is denoting the Accuracy Level of Location. Example: If horizontalAccuracy is 0 means high accuracy and 500 as horizontalAccuracy means low accuracy.
Location Services Provider updates the location based on the consolidated best value of cellular, WiFi (in the case of WiFi connections) and GPS. So, the location value will be oscillating base on coverage. You can filter it by using this horizontalAccuracy.
Horizontal accuracy of X indicates that your horizontal position can be X meters off.. Remember location can be found out using GPS, cell tower triangulation or wifi location data. CLLocationManager gives you a most accurate location from these 3 methods.. And say there is a chance it may be off by atmost X meters.
In what way is the documentation sparse?
The radius of uncertainty for the location, measured in meters. (read-only)
The location’s latitude and longitude identify the center of the circle, and this value indicates the radius of that circle. A negative value indicates that the location’s latitude and longitude are invalid.
So your location is within the circle. It isn't outside the circle, or the radius would be bigger. Your assumption about confidence intervals is incorrect.

Voxel Engine and Optimization

Recently I've started developing voxel engine. What I need is only colorful voxels without texture, but at very large amount (much smaller than minecraft) - and the question is how to draw the scene very fast? I'm using c#/xna but this is in my opinion not very important in this case, let's talk about general cases. Look at these two games:
http://www.youtube.com/watch?v=EKdRri5jSMs
http://www.youtube.com/watch?v=in0bavLJ8KQ
Especially I think video number 2 represents great optimization methods (my gfx card starts choking just at 192 x 192 x 64) How they achieve this?
What i would to have in the engine:
colorful voxels without texture, but shaded
many, many voxels, say minimum 512 x 512 x 128 to achieve something like video #2
shadows (smooth shadows will be great but this is not necessary)
optional: dynamic lighting (for example from fireballs flying, which light up near voxel structures)
framerate minimum 40 FPS
camera have 3 ways of freedom (move in x-axis, move in y-axis, move in z-axis), no camera rotation is needed
finally optional feature may be Depth of Field (it will be sweet ^^ )
What optimization I have already know:
remove unseen voxels that resides inside voxel structure (covered
from six directions by other voxels)
remove unseen faces of voxels - because camera have no rotation and always look aslant forward like in TPP games, so if we divide screen
by vertical cut, left voxels and right voxels will show only 3 faces
keep voxels in Dictionary instead of 3-dimensional array - jumping through array of size 512 x 512 x 128 takes miliseconds which is
unacceptable - but dictionary int:color where int describes packed
3D position is much much faster
use instancing where applciable
occluding? (how to do this?)
space dividing / octtree (is it good idea?)
I'll be very thankful if someone give me a tip how to improve existing optimizations listed above or can share ideas of new improvements. Thanks
1) Voxatron uses a software renderer rather than the GPU. You can read some details about it if you read the comments in this blog post:
http://www.lexaloffle.com/bbs/?tid=201
I haven't looked in detail myself so can't tell you much more than that.
2) I've never played 3D Dot Game Heroes but I don't have any reason to believe it uses voxels at all. I mean, I don't see any cubes being added or deleted. Most likely it is just a static polygon mesh with a nice texture applied.
As for implementing it yourself, do not try to draw the world by rendering cubes as this is very slow. Instead you should process the volume and generate meshes lying on the intersection of solid voxels and empty ones. Break the volume into suitable sized regions (e.g. 32x32x32) and generate a mesh for each.
I have written a book article about this which you might find useful. It's actually about smooth voxel terain but a lot of the priciples stll apply.
You can read it on Google books here: http://books.google.com/books?id=WNfD2u8nIlIC&lpg=PR1&dq=game%20engine%20gems&pg=PA39#v=onepage&q&f=false
And you can find the associated source code here: http://www.thermite3d.org
Since you are using XNA, you can just use instancing to get the desired effect: http://www.float4x4.net/index.php/2010/06/hardware-instancing-in-xna/
http://roecode.wordpress.com/2008/03/17/xna-framework-gameengine-development-part-19-hardware-instancing-pc-only/
The underlying concept is instancing: this feature lets you specify some amount of repeating data and some amount of varying data in a single DrawIndexedPrimitive call. In your case, the instance stream would be a single solid box, and the other stream would be the transform and color information.