If I have a 2-dimensional (x and y coordinates) polynomial transform function of 1st/affine, 2nd, or 3rd order (i.e. I have the coefficients/transformation matrix A), what is the mathematical or programmatic approach to getting the exact inverse of this function? Ideally, how would I implement this in Numpy? This is in the context of image warping or map georeferencing, i.e. transforming or warping the coordinates from an input image to an output image in a new warped coordinate system.
Attempted Solution
To solve this I have tried a matrix algebra approach for solving sets of equations. Mathematically, the transformation procedure is represented as Au = v. Forward transforming is easy, where you calculate u as a column-matrix containing the terms of the polynomial equation based on your input coordinates, and then matrix-multiply u with the transformation matrix A, in order to get the transformed output column matrix v containing the output coordinates. Backwards transforming on the other hand, means we know the output coordinates v and want to find the input coordinates u, so we need to reshuffle our equation as u = Av. By the rules of matrix algebra, the A matrix has to be inverted when moving it over. Implementing this in Numpy for a 2nd order polynomial transform, it does seem to work:
import numpy as np
# input coords
x = np.array([13])
y = np.array([13])
# terms of the 2nd order polynomial equation
x = x
y = y
xx = x*x
xy = x*y
yy = y*y
ones = np.ones(x.shape)
# u consists of each term in 2nd order polynomial equation
# with each term being array if want to transform multiple
u = np.array([xx,xy,yy,x,y,ones])
print('original input u', u)
## output:
## ('original input u', array([[169.],
## [169.],
## [169.],
## [ 13.],
## [ 13.],
## [ 1.]]))
# forward transform matrix
A = np.array([[1,2,3,1,6,8],
[5,2,9,2,0,1],
[8,1,5,8,4,3],
[1,4,8,2,3,9],
[9,3,2,1,9,5],
[4,2,5,6,2,1]])
# get forward coords
v = A.dot(u)
print('output v', v)
## output:
## ('output v', array([[1113.],
## [2731.],
## [2525.],
## [2271.],
## [2501.],
## [1964.]]))
# get backward coords (should exactly reproduce the input coords)
Ainv = np.linalg.inv(A)
u_pred = Ainv.dot(v)
print('backwards predicted input u', u_pred)
## output:
## ('backwards predicted input u', array([[169.],
## [169.],
## [169.],
## [ 13.],
## [ 13.],
## [ 1.]]))
In the above example the output v is actually a 1x6 matrix, where only the top two rows/values represent the transformed x and y coordinates. The problem becomes that we need all the additional values in v in order to exactly inverse the coordinates. But in real-world scenarios we only know transformed x and y values (i.e. the top two rows/values of v), we don't know the full 1x6 v matrix.
Maybe I'm thinking about this wrong, or maybe this matrix algebra approach is not the right approach, since 2nd order polynomials and higher are no longer linear? Any alternate programmatic/numpy approaches for inversing the polyonimal transformation?
Some context
I've looked up many similar questions and websites as well as numpy functions such as numpy.polynomial.Polynomial.fit, but most of them relate only to inversing 1-dimensional polynomial transforms. The few links I've found that talk about 2-dimensional transforms say there is no exact way to inverse it, which doesn't make sense since this is a very common operation in image warping/resampling and map georeferencing. For example, the steps for warping an image is often broken down to:
Forward project all original pixel (column-row) coordinates u using the transformation function/matrix A, in order to find the bounds of the transformed coordinate space v.
Then for every coordinate sampled at regular intervals in the transformed coordinate space bounds (found in step 1), backwards sample these v coordinates in the transformed coordinate system to find their original coordinates u. This determines which original pixels to sample for each location in the transformed image.
My problem then is that I have the forward transformation necessary for step 1, but I need to find the exact inverse of that transformation necessary for backwards sampling in step 2. Either a math answer or a numpy solution would be fine.
Inversion of a 2D affine function is pretty easy. It takes the resolution of a 2x2 linear system of equations.
The case of quadratic and cubic polynomials is much more problematic. If I am right, a system in two unknows is equivalent to a single quartic or nonic (degree 9) polynomial equation. Explicit (though complicated) formulas exist for the quartic case, but none for the nonic case, and you will have to resort to numerical methods (Newton's iterations).
In addition, the solution of these nonlinear equations are not unique (you can have 4 or 9 solutions) and you need to keep the right ones.
If your transformation remains close to affine (such as when correcting image distortion), I would suggest to choose an affine transformation that approximates the complete equation, use the backward transformation to find initial approximations, then refine with Newton.
When I have the rotation matrix or quaternion representation of a camera's pose, is there a way to obtain the orientation vector of the camera?
Here the orientation vector means a 3D vector in the world coordinate (WC) that represents an orientation.
I read through the commonly used representations like euler angles and axis-angle, but I didn't find any representations that can represent the orientation of the camera in WC.
Could anyone help? Thank you!
You probably want the 3x1 Rodrigues vector. Just plug in the SO(3) rotation matrix of the camera orientation in world coordinates, and you will get a vector representation. Just to be clear, pose and orientation are different. Pose is orientation + position. If you want the position as well, that can be represented as a 3x1 vector of t = [x y z]' (using Matlab notation).
A typical representation of the pose is a 4x4 matrix in SE(3) (Special Euclidean Group), which is just:
T = [R t; 0 0 0 1]
Where R is the rotation matrix in SO(3).
I just digged a bit through the Tensorflow Object Detection API code especially the eval_util part, as I wanted to implement the COCO metrics.
But I noticed that the metrics are solely calculated using the bounding boxes which have normalized coordinates between [0, 1].
There are no aspect ratios or absolute coordinates used.
So, doesn't this mean that the intersection over unions calculated on these results are incorrect?
Let's take an 200x100 image pixel as an example.
If the box would be off by 20px to the left, that's 0.1 in normalized coordinates.
But if it would be off by 20px to the top, that would be 0.2 in normalized coordinates.
Doesn't that mean, being off to the top is harder penalizing the score than being off to the side?
I believe the predicted coordinates are resized to the absolute image coordinates in the eval binary.
But the other thing I would say is that IOU is scale invariant in the sense that if you scale two boxes by some factor, they will still have the same IOU overlap. As an example if we scale by 2 in the x-direction and scale by 3 in the y direction:
If A is (x1, y1, x2, y2) and B is (u1, v1, u2, v2), then IOU((A, B))
= IOU((2*x1, 3*y1, 2*x2, 3*y2), (2*u1, 3*v1, 2*u2, 3*v2))
What this means is that evaluating in normalized coordinates should give the same result as evaluating in absolute coordinates.
Im trying to calculate the normal matrix for my GLSL shaders on OpenGL 2.0.
The theory is : a normal matrix is the top left 3x3 matrix of the ModelView, transposed and inverted.
It seems to be correct as I have been rendering my scenes correctly, until I imported a model from maya and found non-uniform scales. Loaded models have a weird lighting, while my procedural ones are correct, so I put my money on the normal matrix calculation.
How is it computed with non uniform scale?
You already figured out that you need the transposed inverted matrix for transforming the normals. For a scaling matrix, that's easy to calculate.
A non-uniform 3x3 scaling matrix looks like this:
[ sx 0 0 ]
[ 0 sy 0 ]
[ 0 0 sz ]
with sx, sy and sz being the scaling factors for the 3 coordinate directions.
The inverse of this is:
[ 1 / sx 0 0 ]
[ 0 1 / sy 0 ]
[ 0 0 1 / sz ]
Transposing it changes nothing, so this is already your normal transformation matrix.
Note that, unlike for example a rotation, this transformation matrix will not keep vectors normalized when it is applied to a normalized vector. So after applying this matrix in your shader, you will have to re-normalize the result before using it for lighting calculations.
I would just like to add a practical example to Reto Koradi's answer.
Let's assume you already have a 4x4 model matrix and want to use it to transform normals as well. You can start by deducing scale in each axis by taking length of the 3 first columns of that matrix. If you now divide each column by its corresponding scaling factor, the matrix will no longer affect model's scale, because the basis vectors will have unit length.
As you pointed out, normals have to be scaled by the inverse of the scale in each axis. Fortunately, we have already derived the scale in the first step, so we can divide the columns again.
All that effectively means that if you want to derive transform matrix for normals from your model matrix, all you need to do is to divide each of its first three columns by their lengths squared (which can be rewritten as dot products). In GLSL you would write:
mat3 mat_n = mat3(mat_model);
mat_n[0] /= dot(mat_n[0], mat_n[0]);
mat_n[1] /= dot(mat_n[1], mat_n[1]);
mat_n[2] /= dot(mat_n[2], mat_n[2]);
vec3 new_normal = normalize(mat_n * normal);
Is there a formula to convert a quaternion to an angle?
Looking to do something on the iPhone using the Core Motion API and the gyro so that based on data I receive from it (in the form of quaternions) I can project a UIView on the screen.
Thanks
Yes, see Quaternions and spatial rotation. The unit quaternion (w, x, y, z) represents a rotation about the axis (x, y, z) of an angle of 2*cos-1(w).
Note that this is only true of unit quaternions. Non-unit quaternions do not represent rotations.