calculate the depth feature in kinect? - kinect

I am learning how kinect works. But I am new to this field. somebody please help me , to better understand the given formula
Fϴ(I,x)=DI(x+(u/DI(x))-DI(x+(v/DI(x)))
In the description it is given that where
dI (x) is the depth at pixel x in image I, and
parametersϴ = (u, v) describe offsets u and v.
The normalization of the offsets by 1/dI (x) ensures the features are depth invariant:at a given point on the body, a fixed world space offset will result whether the pixel is close or far from the camera.
I am still unable to understand the whole formula that what is the depth at pixel x, what are the offsets u,v , offsets u and v, normalization of offsets ,..please help!

For pixel X and offset(U,V) are all vectors instead of numbers.
For example X is (100,200).
U is (5,0) V is (0,10).
And the normalization of offsets means if the point is near to the camera the offset is smaller, as the depth value in the near region is bigger. In the picture you can see, theta1 is bigger than theta2.
In that case, it will be depth invariable.

Related

Is there a simple math solution to sample a disk area light? (Raytracing)

I'm trying to implement different types of lights in my ray-tracer coded in C. I have successfully implemented spot, point, directional and rectangular area lights.
For rectangular area light I define two vectors (U and V) in space and I use them to move into the virtual (delimited) rectangle they form.
Depending on the intensity of the light I take several samples on the rectangle then I calculate the amount of the light reaching a point as though each sample were a single spot light.
With rectangles it is very easy to find the position of the various samples, but things get complicated when I try to do the same with a disk light.
I found little documentation about that and most of them already use ready-made functions to do so.
The only interesting thing I found is this document (https://graphics.pixar.com/library/DiskLightSampling/paper.pdf) but I'm unable to exploit it.
Would you know how to help me achieve a similar result (of the following image) with vector operations? (ex. Having the origin, orientation, radius of the disk and the number of samples)
Any advice or documentation in this regard would help me a lot.
This question reduces to:
How can I pick a uniformly-distributed random point on a disk?
A naive approach would be to generate random polar coordinates and transform them to cartesian coordinates:
Randomly generate an angle θ between 0 and 2π
Randomly generate a distance d between 0 and radius r of your disk
Transform to cartesian coordinates with x = r cos θ and y = r sin θ
This is incorrect because it causes the points to bunch up in the center; for example:
A correct, but inefficient, way to do this is via rejection sampling:
Uniformly generate random x and y, each over [0, 1]
If sqrt(x^2 + y^2) < 1, return the point
Goto 1
The correct way to do this is illustrated here:
Randomly generate an angle θ between 0 and 2π
Randomly generate a distance d between 0 and radius r of your disk
Transform to cartesian coordinates with x = sqrt(r) cos θ and y = sqrt(r) sin θ

pose estimation: determine whether rotation and transmation matrix are right

Recently I'm struggling with a pose estimation problem with a single camera. I have some 3D points and the corresponding 2D points on the image. Then I use solvePnP to get the rotation and translation vectors. The problem is, how can I determine whether the vectors are right results?
Now I use an indirect way to do this:
I use the rotation matrix, the translation vector and the world 3D coordinates of a certain point to obtain the coordinates of that point in Camera system. Then all I have to do is to determine whether the coordinates are reasonable. I think I know the directions of x, y and z axes of Camera system.
Is Camera center the origin of the Camera system?
Now consider the x component of that point. Is x equavalent to the distance of the camera and the point in the world space in Camera's x-axis direction (the sign can then be determined by the point is placed on which side of the camera)?
The figure below is in world space, while the axes depicted are in Camera system.
========How Camera and the point be placed in the world space=============
|
|
Camera--------------------------> Z axis
| |} Xw?
| P(Xw, Yw, Zw)
|
v x-axis
My rvec and tvec results seems right and wrong. For a specified point, the z value seems reasonable, I mean, if this point is about one meter away from the camera in the z direction, then the z value is about 1. But for x and y, according to the location of the point I think x and y should be positive but they are negative. What's more, the pattern detected in the original image is like this:
But using the points coordinates calculated in Camera system and the camera intrinsic parameters, I get an image like this:
The target keeps its pattern. But it moved from bottom right to top left. I cannot understand why.
Yes, the camera center is the origin of the camera coordinate system, which seems to be right following to this post.
In case of camera pose estimation, value seems reasonable can be named as backprojection error. That's a measure of how well your resulting rotation and translation map the 3D points to the 2D pixels. Unfortunately, solvePnP does not return a residual error measure. Therefore one has to compute it:
cv::solvePnP(worldPoints, pixelPoints, camIntrinsics, camDistortion, rVec, tVec);
// Use computed solution to project 3D pattern to image
cv::Mat projectedPattern;
cv::projectPoints(worldPoints, rVec, tVec, camIntrinsics, camDistortion, projectedPattern);
// Compute error of each 2D-3D correspondence.
std::vector<float> errors;
for( int i=0; i < corners.size(); ++i)
{
float dx = pixelPoints.at(i).x - projectedPattern.at<float>(i, 0);
float dy = pixelPoints.at(i).y - projectedPattern.at<float>(i, 1);
// Euclidean distance between projected and real measured pixel
float err = sqrt(dx*dx + dy*dy);
errors.push_back(err);
}
// Here, compute max or average of your "errors"
An average backprojection error of a calibrated camera might be in the range of 0 - 2 pixel. According to your two pictures, this would be way more. To me, it looks like a scaling problem. If I am right, you compute the projection yourself. Maybe you can try once cv::projectPoints() and compare.
When it comes to transformations, I learned not to follow my imagination :) The first thing I Do with the returned rVec and tVec is usually creating a 4x4 rigid transformation matrix out of it (I posted once code here). This makes things even less intuitive, but instead it is compact and handy.
Now I know the answers.
Yes, the camera center is the origin of the camera coordinate system.
Consider that the coordinates in the camera system are calculated as (xc,yc,zc). Then xc should be the distance between the camera and
the point in real world in the x direction.
Next, how to determine whether the output matrices are right?
1. as #eidelen points out, backprojection error is one indicative measure.
2. Calculate the coordinates of the points according to their coordinates in the world coordinate system and the matrices.
So why did I get a wrong result(the pattern remained but moved to a different region of the image)?
Parameter cameraMatrix in solvePnP() is a matrix supplying the parameters of the camera's external parameters. In camera matrix, you should use width/2 and height/2 for cx and cy. While I use width and height of the image size. I think that caused the error. After I corrected that and re-calibrated the camera, everything seems fine.

2D DCT coeficients meaning of a gray scale image

what are the DCT coefficients mean. And what is the difference between a positive and a negative DCT's coefficient for example coeficient 5 and -5.
Thanks
The DCT is simply a 1-to-1 transformation of the data.
Suppose you have a set of blueprints on paper. You scan them in. Once scanned they are crooked. You use Photoshop or something like it to rotate the image to its aligned to the edges and easier to work with.
The DCT is like a rotation in that it simply makes the image data easier to work with. I have to say that a lot of books make this confusing by adding spectral analysis mumbo-jumbo.
Desirable attributes of the DCT for this purpose are:
That it is a transformation to an orthonormal basis set. If D is the DCT transformation matrix, X is the input and Y is the output so that
X D = Y
Then there is an inverse matrix Q that gives:
Y Q = X
And Q is the transpose of D.
Therefore, it is just as easy to go forwards as it is to go backwards with the DCT.
The DCT transformation tends to concentrate the most important image data in one corner of the output matrix. The data at the opposite corner tends to be discardable without noticeably affecting photographic images.
As to your other question, the JPEG input pixels are translated to the range -127 to 128. Your starting values usually have negative values to it's no surprise that you get negative output values. Even if you did have all positive input values you could still get negative output values. There is no real significance between positive and negative values.

Need deep explanation for viewport /perspective/frustum calculations

I have a lot of tutorials & books, but I'm unable to understand how my viewport, my near & far distance etc are used to calc perspective / frustum matrix.
I have the learningwebgl lessons, but.... I dont understand what viewport & 3D space adjustments are made.... What is my initial window projection size ? Why I see the triangle & square placed at z = -7.
Another thing I dont understand . A near plane of 0.001 creates the window projection just in front of my nose ? So what is my projection window dimension ?
I need a very deeper and basic help....
Can anybody help me ? Some really usefull links? I need graphical examples showing & teaching how frustum is calculated.
Thanks
There's this
http://games.greggman.com/game/webgl-3d-perspective/
Imagine you're in 2D. You have a canvas that's 200x100 pixels. If you draw at x = 201 it will be off the canvas. Similarly at x = -1 it will be off the canvas.
In WebGL it works in a 3D space that goes from -1 to +1 in x, y and z. The perspective / frustum matrix is the matrix that takes your 3d scene and converts it to this -1 / +1 space. The near and far values define what range in world space get converted to the -1 / +1 "clipspace". Anything outside that range will be clipped just like the 2D example. If you set near to 10 and far to 100 then something at Z = 9 will be clipped because it's too near and something at 101 will also be clipped as something that's too far. More specifically the near and far settings will form a matrix such that when a point is at Z = near it will become -1 when multiplied by the matrix and when it's at Z = far it will become +1 when multiplied by the matrix.
The viewport setting tells WebGL how to convert from the -1 to +1 space back into pixels.

kinect object measuring

I am currently trying to figure out a way to calcute the size of a given object with kinect
since I have the following data
angular field of view of the lens
distance
and width in pixels from a 800*600 resolution
I believe this can be possible to calculate. Does anyone has math skills to give me a little help?
With some trigonometry, it should be possible to approximate.
If you draw a right trangle ABC, with the camera at one of the legs (A), and the object at the far end (edge BC), where the right angle is (C), then the height of the object is going to be the height of leg BC. the distance to the pixel might be the distance of leg AC or AB. The Kinect sensor specifications are going to regulate that. If you get distance to the center of a pixel, then it will be AC. if you have distances to pixel corners then the distance will be AB.
With A representing the angle at the camera that the pixel takes up, d is the distance of the hypotenuse of a right angle and y is the distance of the far leg (edge BC):
sin(A) = y / d
y = d sin(A)
y is the length of the pixel projected into the object plane. You calculate it by multiplying the sin of the angel by the distance to the object.
Here I confess I do not know the API of the kinect, and what level of detail it provides. You say you have the angle of the field of vision. You might assume each pixel of your 800x600 pixel grid takes up an equal angle of your camera's field of vision. If you do, then you can break up that field of vision into equal pieces to measure the linear size of your object in each pixel.
You also mentioned that you have the distance to the object. I was assuming that you have a distance map for each pixel of the 800x600 grid. If this is incorrect, some calculations can be done to approximate a distance grid for the pixels involving the object of interest if you make some assumptions about the object being measured.