pose estimation: determine whether rotation and transmation matrix are right - camera

Recently I'm struggling with a pose estimation problem with a single camera. I have some 3D points and the corresponding 2D points on the image. Then I use solvePnP to get the rotation and translation vectors. The problem is, how can I determine whether the vectors are right results?
Now I use an indirect way to do this:
I use the rotation matrix, the translation vector and the world 3D coordinates of a certain point to obtain the coordinates of that point in Camera system. Then all I have to do is to determine whether the coordinates are reasonable. I think I know the directions of x, y and z axes of Camera system.
Is Camera center the origin of the Camera system?
Now consider the x component of that point. Is x equavalent to the distance of the camera and the point in the world space in Camera's x-axis direction (the sign can then be determined by the point is placed on which side of the camera)?
The figure below is in world space, while the axes depicted are in Camera system.
========How Camera and the point be placed in the world space=============
|
|
Camera--------------------------> Z axis
| |} Xw?
| P(Xw, Yw, Zw)
|
v x-axis
My rvec and tvec results seems right and wrong. For a specified point, the z value seems reasonable, I mean, if this point is about one meter away from the camera in the z direction, then the z value is about 1. But for x and y, according to the location of the point I think x and y should be positive but they are negative. What's more, the pattern detected in the original image is like this:
But using the points coordinates calculated in Camera system and the camera intrinsic parameters, I get an image like this:
The target keeps its pattern. But it moved from bottom right to top left. I cannot understand why.

Yes, the camera center is the origin of the camera coordinate system, which seems to be right following to this post.
In case of camera pose estimation, value seems reasonable can be named as backprojection error. That's a measure of how well your resulting rotation and translation map the 3D points to the 2D pixels. Unfortunately, solvePnP does not return a residual error measure. Therefore one has to compute it:
cv::solvePnP(worldPoints, pixelPoints, camIntrinsics, camDistortion, rVec, tVec);
// Use computed solution to project 3D pattern to image
cv::Mat projectedPattern;
cv::projectPoints(worldPoints, rVec, tVec, camIntrinsics, camDistortion, projectedPattern);
// Compute error of each 2D-3D correspondence.
std::vector<float> errors;
for( int i=0; i < corners.size(); ++i)
{
float dx = pixelPoints.at(i).x - projectedPattern.at<float>(i, 0);
float dy = pixelPoints.at(i).y - projectedPattern.at<float>(i, 1);
// Euclidean distance between projected and real measured pixel
float err = sqrt(dx*dx + dy*dy);
errors.push_back(err);
}
// Here, compute max or average of your "errors"
An average backprojection error of a calibrated camera might be in the range of 0 - 2 pixel. According to your two pictures, this would be way more. To me, it looks like a scaling problem. If I am right, you compute the projection yourself. Maybe you can try once cv::projectPoints() and compare.
When it comes to transformations, I learned not to follow my imagination :) The first thing I Do with the returned rVec and tVec is usually creating a 4x4 rigid transformation matrix out of it (I posted once code here). This makes things even less intuitive, but instead it is compact and handy.

Now I know the answers.
Yes, the camera center is the origin of the camera coordinate system.
Consider that the coordinates in the camera system are calculated as (xc,yc,zc). Then xc should be the distance between the camera and
the point in real world in the x direction.
Next, how to determine whether the output matrices are right?
1. as #eidelen points out, backprojection error is one indicative measure.
2. Calculate the coordinates of the points according to their coordinates in the world coordinate system and the matrices.
So why did I get a wrong result(the pattern remained but moved to a different region of the image)?
Parameter cameraMatrix in solvePnP() is a matrix supplying the parameters of the camera's external parameters. In camera matrix, you should use width/2 and height/2 for cx and cy. While I use width and height of the image size. I think that caused the error. After I corrected that and re-calibrated the camera, everything seems fine.

Related

Three.js camera tilt up or down and keep horizon level

camera.rotate.y pans left or right in a predictable manner.
camera.rotate.x looks up or down predictably when camera.rotate.y is at 180 degrees.
but when I change the value of camera.rotate.y to some new value, and then I change the value of camera.rotate.x, the horizon rotates.
I've looked for an algorithm to adjust for horizon rotation after camera.rotate.x is changed, but haven't found it.
In three.js, an object's orientation can be specified by its Euler rotation vector object.rotation. The three components of the rotation vector represent the rotation in radians around the object's internal x-axis, y-axis, and z-axis respectively.
The order in which the rotations are performed is specified by object.rotation.order. The default order is "XYZ" -- rotation around the x-axis occurs first, then the y-axis, then the z-axis.
Rotations are performed with respect to the object's internal coordinate system -- not the world coordinate system. This is important. So, for example, after the x-rotation occurs, the object's y- and z- axes will generally no longer be aligned with the world axes. Rotations specified in this way are not unique.
So, for example, if in code you specify,
camera.rotation.y = y_radians; // Y first
camera.rotation.x = x_radians; // X second
camera.rotation.z = 0;
the rotations are applied in the object's rotation.order, not in the order you specified them.
In your case, you may find it more intuitive to set rotation.order to "YXZ", which is equivalent to "heading, pitch, and roll".
For more information about Euler angles, see the Wikipedia article. Three.js follows the Tait–Bryan convention, as explained in the article.
three.js r.61
I've been looking for the same info for few days now, the trick is: use regular rotateX to look up/down, but use rotateOnWorldAxis(new THREE.Vector3(0.0, 1.0, 0.0), angle) for horiz turn (https://discourse.threejs.org/t/vertical-camera-rotation/15334).

Obtaining 3D location of an object being looked at by a camera with known position and orientation

I am building an augmented reality application and I have the yaw, pitch, and roll for the camera. I want to start placing objects in the 3D environment. I want to make it so that when the user clicks, a 3D point pops up right where the camera is pointed (center of the 2D screen) and when the user moves, the point moves accordingly in 3D space. The camera does not change position, only orientation. Is there a proper way to recover the 3D location of this point? We can assume that all points are equidistant from the camera location.
I am able to accomplish this independently for two axes (OpenGL default orientation). This works for changes in the vertical axis:
x = -sin(pitch)
y = cos(pitch)
z = 0
This also works for changes in the horizontal axis:
x = 0
y = -sin(yaw)
z = cos(yaw)
I was thinking that I should just make combine into:
x = -sin(pitch)
y = sin(yaw) * cos(pitch)
z = cos(yaw)
and that seems to be close, but not exactly correct. Any suggestions would be greatly appreciated!
It sounds like you just want to convert from a rotation vector (pitch,yaw,roll) to a rotation matrix. The conversion can bee seen on the Wikipedia article on rotation matrices. The idea is that once you have constructed your matrix, to transform any point simply.
final_pos = rot_mat*initial_pose
where final and initial pose are 3x1 vectors and rot_mat is a 3x3 matrix.

Draw a scatterplot matrix using glut, opengl

I am new to GLUT and opengl. I need to draw a scatterplot matrix for n dimensional array.
I have saved the data from csv to a vector of vectors and each vector corresponds to a row. I have plotted just one scatterplot. And used GL_LINES to draw the grid. My questions
1. How do I draw points in a particular grid? Using GL_POINTS I can only draw points in the entire window.
Please let me know need any further info to answer this question
Thanks
What you need to do is be able to transform your data's (x,y) coordinates into screen coordinates. The most straightforward way to do it actually does not rely on OpenGL or GLUT. All you have to do is use a little math. Determine the screen (x,y) coordinates of the place where you want a datapoint for (0,0) to be on the screen, and then determine how far apart you want one increment to be on the screen. Simply take your original data points, apply the offset, and then scale them, to get your screen coordinates, which you then pass into glVertex2f() (or whatever function you are using to specify points in your API).
For instance, you might decide you want point (0,0) in your data to be at location (200,0) on your screen, and the distance between 0 and 1 in your data to be 30 pixels on the screen. This operation will look like this:
int x = 0, y = 0; //Original data points
int scaleX = 30, scaleY = 30; //Scaling values for each component
int offsetX = 100, offsetY = 100; //Where you want the origin of your graph to be
// Apply the scaling values and offsets:
int screenX = x * scaleX + offsetX;
int screenY = y * scaleY + offsetY;
// Calls to your drawing functions using screenX and screenY as your coordinates
You will have to determine values that make sense for the scalaing and offsets. You can also have your program use different values for different sets of data, so you can display multiple graphs on the same screen. But this is a simple way to do it.
There are also other ways you can go about this. OpenGL has very powerful coordinate transformation functions and matrix math capabilities. Those may become more useful when you develop increasingly elaborate programs. They're most useful if you're going to be moving things around the screen in real-time, or operating on incredibly large data sets, as they allow you to perform these mathematical calculations very quickly using your graphics hardware (which is able to do them much faster than the CPU). However, the time it takes for the CPU to do simple calculations like those where you only are going to do them once or very infrequently on limited sets of data is not a problem for computers today.

Computer vision: Regarding a line through origin in camera coordinate

I've a question regarding a line in camera coordinate.
Suppose the pixel/screen coordinate of a point is (u,v). And the camera coordinate
(coordinate system relative to camera) of (u,v) is (p,q,r) where (u,v) is given and a
line L goes through the point (0,0,0) [origin camera location] and (p,q,r) where r is
given. Is it possible to find (p,q)?
I know that the parametric equation of a line is:
(x-a, y-b, z-c)= t(x_0, y_0, z_0)
But I know only (a,b,c) which is (0,0,0) and z_0 which is r. Can anyone kindly tell me if it is possible to find
the value of (p,q)? Can I use (u,v) in some way?
It's not possible until you have more information about what something like (u, v) represents. Think of it this way. Suppose you claimed you could figure it out just based on (u, v) and r. Now, what if I just relabeled your pixels? A pixel doesn't have to represent any specific distance, so if I said (125, 100) was (250, 200) instead, that would make sense too. Suppose I just swap in a higher resolution chip for a lower resolution chip.
To actually recover (p, q), you'd have to know what physical distance a pixel corresponds to. You'd also have to know whether the pinhole in your camera model is (0,0) in your pixel reference frame, etc.

kinect object measuring

I am currently trying to figure out a way to calcute the size of a given object with kinect
since I have the following data
angular field of view of the lens
distance
and width in pixels from a 800*600 resolution
I believe this can be possible to calculate. Does anyone has math skills to give me a little help?
With some trigonometry, it should be possible to approximate.
If you draw a right trangle ABC, with the camera at one of the legs (A), and the object at the far end (edge BC), where the right angle is (C), then the height of the object is going to be the height of leg BC. the distance to the pixel might be the distance of leg AC or AB. The Kinect sensor specifications are going to regulate that. If you get distance to the center of a pixel, then it will be AC. if you have distances to pixel corners then the distance will be AB.
With A representing the angle at the camera that the pixel takes up, d is the distance of the hypotenuse of a right angle and y is the distance of the far leg (edge BC):
sin(A) = y / d
y = d sin(A)
y is the length of the pixel projected into the object plane. You calculate it by multiplying the sin of the angel by the distance to the object.
Here I confess I do not know the API of the kinect, and what level of detail it provides. You say you have the angle of the field of vision. You might assume each pixel of your 800x600 pixel grid takes up an equal angle of your camera's field of vision. If you do, then you can break up that field of vision into equal pieces to measure the linear size of your object in each pixel.
You also mentioned that you have the distance to the object. I was assuming that you have a distance map for each pixel of the 800x600 grid. If this is incorrect, some calculations can be done to approximate a distance grid for the pixels involving the object of interest if you make some assumptions about the object being measured.