normal matrix for non uniform scaling - opengl-es-2.0

Im trying to calculate the normal matrix for my GLSL shaders on OpenGL 2.0.
The theory is : a normal matrix is the top left 3x3 matrix of the ModelView, transposed and inverted.
It seems to be correct as I have been rendering my scenes correctly, until I imported a model from maya and found non-uniform scales. Loaded models have a weird lighting, while my procedural ones are correct, so I put my money on the normal matrix calculation.
How is it computed with non uniform scale?

You already figured out that you need the transposed inverted matrix for transforming the normals. For a scaling matrix, that's easy to calculate.
A non-uniform 3x3 scaling matrix looks like this:
[ sx 0 0 ]
[ 0 sy 0 ]
[ 0 0 sz ]
with sx, sy and sz being the scaling factors for the 3 coordinate directions.
The inverse of this is:
[ 1 / sx 0 0 ]
[ 0 1 / sy 0 ]
[ 0 0 1 / sz ]
Transposing it changes nothing, so this is already your normal transformation matrix.
Note that, unlike for example a rotation, this transformation matrix will not keep vectors normalized when it is applied to a normalized vector. So after applying this matrix in your shader, you will have to re-normalize the result before using it for lighting calculations.

I would just like to add a practical example to Reto Koradi's answer.
Let's assume you already have a 4x4 model matrix and want to use it to transform normals as well. You can start by deducing scale in each axis by taking length of the 3 first columns of that matrix. If you now divide each column by its corresponding scaling factor, the matrix will no longer affect model's scale, because the basis vectors will have unit length.
As you pointed out, normals have to be scaled by the inverse of the scale in each axis. Fortunately, we have already derived the scale in the first step, so we can divide the columns again.
All that effectively means that if you want to derive transform matrix for normals from your model matrix, all you need to do is to divide each of its first three columns by their lengths squared (which can be rewritten as dot products). In GLSL you would write:
mat3 mat_n = mat3(mat_model);
mat_n[0] /= dot(mat_n[0], mat_n[0]);
mat_n[1] /= dot(mat_n[1], mat_n[1]);
mat_n[2] /= dot(mat_n[2], mat_n[2]);
vec3 new_normal = normalize(mat_n * normal);

Related

Getting the inverse of a 2d polynomial transform with numpy (for image or raster image warping/sampling)

If I have a 2-dimensional (x and y coordinates) polynomial transform function of 1st/affine, 2nd, or 3rd order (i.e. I have the coefficients/transformation matrix A), what is the mathematical or programmatic approach to getting the exact inverse of this function? Ideally, how would I implement this in Numpy? This is in the context of image warping or map georeferencing, i.e. transforming or warping the coordinates from an input image to an output image in a new warped coordinate system.
Attempted Solution
To solve this I have tried a matrix algebra approach for solving sets of equations. Mathematically, the transformation procedure is represented as Au = v. Forward transforming is easy, where you calculate u as a column-matrix containing the terms of the polynomial equation based on your input coordinates, and then matrix-multiply u with the transformation matrix A, in order to get the transformed output column matrix v containing the output coordinates. Backwards transforming on the other hand, means we know the output coordinates v and want to find the input coordinates u, so we need to reshuffle our equation as u = Av. By the rules of matrix algebra, the A matrix has to be inverted when moving it over. Implementing this in Numpy for a 2nd order polynomial transform, it does seem to work:
import numpy as np
# input coords
x = np.array([13])
y = np.array([13])
# terms of the 2nd order polynomial equation
x = x
y = y
xx = x*x
xy = x*y
yy = y*y
ones = np.ones(x.shape)
# u consists of each term in 2nd order polynomial equation
# with each term being array if want to transform multiple
u = np.array([xx,xy,yy,x,y,ones])
print('original input u', u)
## output:
## ('original input u', array([[169.],
## [169.],
## [169.],
## [ 13.],
## [ 13.],
## [ 1.]]))
# forward transform matrix
A = np.array([[1,2,3,1,6,8],
[5,2,9,2,0,1],
[8,1,5,8,4,3],
[1,4,8,2,3,9],
[9,3,2,1,9,5],
[4,2,5,6,2,1]])
# get forward coords
v = A.dot(u)
print('output v', v)
## output:
## ('output v', array([[1113.],
## [2731.],
## [2525.],
## [2271.],
## [2501.],
## [1964.]]))
# get backward coords (should exactly reproduce the input coords)
Ainv = np.linalg.inv(A)
u_pred = Ainv.dot(v)
print('backwards predicted input u', u_pred)
## output:
## ('backwards predicted input u', array([[169.],
## [169.],
## [169.],
## [ 13.],
## [ 13.],
## [ 1.]]))
In the above example the output v is actually a 1x6 matrix, where only the top two rows/values represent the transformed x and y coordinates. The problem becomes that we need all the additional values in v in order to exactly inverse the coordinates. But in real-world scenarios we only know transformed x and y values (i.e. the top two rows/values of v), we don't know the full 1x6 v matrix.
Maybe I'm thinking about this wrong, or maybe this matrix algebra approach is not the right approach, since 2nd order polynomials and higher are no longer linear? Any alternate programmatic/numpy approaches for inversing the polyonimal transformation?
Some context
I've looked up many similar questions and websites as well as numpy functions such as numpy.polynomial.Polynomial.fit, but most of them relate only to inversing 1-dimensional polynomial transforms. The few links I've found that talk about 2-dimensional transforms say there is no exact way to inverse it, which doesn't make sense since this is a very common operation in image warping/resampling and map georeferencing. For example, the steps for warping an image is often broken down to:
Forward project all original pixel (column-row) coordinates u using the transformation function/matrix A, in order to find the bounds of the transformed coordinate space v.
Then for every coordinate sampled at regular intervals in the transformed coordinate space bounds (found in step 1), backwards sample these v coordinates in the transformed coordinate system to find their original coordinates u. This determines which original pixels to sample for each location in the transformed image.
My problem then is that I have the forward transformation necessary for step 1, but I need to find the exact inverse of that transformation necessary for backwards sampling in step 2. Either a math answer or a numpy solution would be fine.
Inversion of a 2D affine function is pretty easy. It takes the resolution of a 2x2 linear system of equations.
The case of quadratic and cubic polynomials is much more problematic. If I am right, a system in two unknows is equivalent to a single quartic or nonic (degree 9) polynomial equation. Explicit (though complicated) formulas exist for the quartic case, but none for the nonic case, and you will have to resort to numerical methods (Newton's iterations).
In addition, the solution of these nonlinear equations are not unique (you can have 4 or 9 solutions) and you need to keep the right ones.
If your transformation remains close to affine (such as when correcting image distortion), I would suggest to choose an affine transformation that approximates the complete equation, use the backward transformation to find initial approximations, then refine with Newton.

2D DCT coeficients meaning of a gray scale image

what are the DCT coefficients mean. And what is the difference between a positive and a negative DCT's coefficient for example coeficient 5 and -5.
Thanks
The DCT is simply a 1-to-1 transformation of the data.
Suppose you have a set of blueprints on paper. You scan them in. Once scanned they are crooked. You use Photoshop or something like it to rotate the image to its aligned to the edges and easier to work with.
The DCT is like a rotation in that it simply makes the image data easier to work with. I have to say that a lot of books make this confusing by adding spectral analysis mumbo-jumbo.
Desirable attributes of the DCT for this purpose are:
That it is a transformation to an orthonormal basis set. If D is the DCT transformation matrix, X is the input and Y is the output so that
X D = Y
Then there is an inverse matrix Q that gives:
Y Q = X
And Q is the transpose of D.
Therefore, it is just as easy to go forwards as it is to go backwards with the DCT.
The DCT transformation tends to concentrate the most important image data in one corner of the output matrix. The data at the opposite corner tends to be discardable without noticeably affecting photographic images.
As to your other question, the JPEG input pixels are translated to the range -127 to 128. Your starting values usually have negative values to it's no surprise that you get negative output values. Even if you did have all positive input values you could still get negative output values. There is no real significance between positive and negative values.

GLKView GLKMatrix4MakeLookAt description and explanation

For modelviewMatrix I understand how to form translate and scale Matrix. But I am unable to understand how to form viewMatrix using GLKMatrix4MakeLookAt. Can anyone explain how to it works and how to give value to parameters(eye center up X Y Z).
GLK_INLINE GLKMatrix4 GLKMatrix4MakeLookAt(float eyeX, float eyeY, float eyeZ,
float centerX, float centerY, float centerZ,
float upX, float upY, float upZ)
GLKMatrix4MakeLookAt creates a viewing matrix (in the same way as gluLookAt does, in case you look at other OpenGL code). As the parameters suggest, it considers the position of the viewer's eye, the point in space they're looking at (e.g., a point on an object), and the up vector, which specifies which direction is "up" (e.g., pointing towards the sky). The viewing matrix generated is the combination of a rotation matrix (composed of a set of orthonormal bases [basis vectors]) and an translation.
Logically, the matrix is basically constructed in a few steps:
compute the line-of-sight vector, which is the normalized vector going from the eye's position to the point you're looking at, the center point.
compute the cross product of the line-of-sight vector with the up vector, and normalize the resulting vector.
compute the cross product of the vector computed in step 2. with the line-of-sight to complete the orthonormal basis.
create a 3x3 rotation matrix by setting the first row to the vector created in step 2., the middle row with the vector from step 3., and the bottom row to the negated, normalized line-of-sight vector.
those three steps produce a rotation matrix that will rotate the world coordinate system into eye coordinates (a coordinate system where the eye is located at the origin, and the line-of-sight is down the -z axis. The final viewing matrix is computed by multiplying a translation to the negated eye position, which moves the "world coordinate positioned eye" to the origin for eye coordinates.
Here's a related question showing the code of GLKMatrix4MakeLookAt, and here's a question with more detail about eye coordinates and related coordinate systems: (What exactly are eye space coordinates?) .

Faster way to perform point-wise interplation of numpy array?

I have a 3D datacube, with two spatial dimensions and the third being a multi-band spectrum at each point of the 2D image.
H[x, y, bands]
Given a wavelength (or band number), I would like to extract the 2D image corresponding to that wavelength. This would be simply an array slice like H[:,:,bnd]. Similarly, given a spatial location (i,j) the spectrum at that location is H[i,j].
I would also like to 'smooth' the image spectrally, to counter low-light noise in the spectra. That is for band bnd, I choose a window of size wind and fit a n-degree polynomial to the spectrum in that window. With polyfit and polyval I can find the fitted spectral value at that point for band bnd.
Now, if I want the whole image of bnd from the fitted value, then I have to perform this windowed-fitting at each (i,j) of the image. I also want the 2nd-derivative image of bnd, that is, the value of the 2nd-derivative of the fitted spectrum at each point.
Running over the points, I could polyfit-polyval-polyder each of the x*y spectra. While this works, this is a point-wise operation. Is there some pytho-numponic way to do this faster?
If you do least-squares polynomial fitting to points (x+dx[i],y[i]) for a fixed set of dx and then evaluate the resulting polynomial at x, the result is a (fixed) linear combination of the y[i]. The same is true for the derivatives of the polynomial. So you just need a linear combination of the slices. Look up "Savitzky-Golay filters".
EDITED to add a brief example of how S-G filters work. I haven't checked any of the details and you should therefore not rely on it to be correct.
So, suppose you take a filter of width 5 and degree 2. That is, for each band (ignoring, for the moment, ones at the start and end) we'll take that one and the two on either side, fit a quadratic curve, and look at its value in the middle.
So, if f(x) ~= ax^2+bx+c and f(-2),f(-1),f(0),f(1),f(2) = p,q,r,s,t then we want 4a-2b+c ~= p, a-b+c ~= q, etc. Least-squares fitting means minimizing (4a-2b+c-p)^2 + (a-b+c-q)^2 + (c-r)^2 + (a+b+c-s)^2 + (4a+2b+c-t)^2, which means (taking partial derivatives w.r.t. a,b,c):
4(4a-2b+c-p)+(a-b+c-q)+(a+b+c-s)+4(4a+2b+c-t)=0
-2(4a-2b+c-p)-(a-b+c-q)+(a+b+c-s)+2(4a+2b+c-t)=0
(4a-2b+c-p)+(a-b+c-q)+(c-r)+(a+b+c-s)+(4a+2b+c-t)=0
or, simplifying,
22a+10c = 4p+q+s+4t
10b = -2p-q+s+2t
10a+5c = p+q+r+s+t
so a,b,c = p-q/2-r-s/2+t, (2(t-p)+(s-q))/10, (p+q+r+s+t)/5-(2p-q-2r-s+2t).
And of course c is the value of the fitted polynomial at 0, and therefore is the smoothed value we want. So for each spatial position, we have a vector of input spectral data, from which we compute the smoothed spectral data by multiplying by a matrix whose rows (apart from the first and last couple) look like [0 ... 0 -9/5 4/5 11/5 4/5 -9/5 0 ... 0], with the central 11/5 on the main diagonal of the matrix.
So you could do a matrix multiplication for each spatial position; but since it's the same matrix everywhere you can do it with a single call to tensordot. So if S contains the matrix I just described (er, wait, no, the transpose of the matrix I just described) and A is your 3-dimensional data cube, your spectrally-smoothed data cube would be numpy.tensordot(A,S).
This would be a good point at which to repeat my warning: I haven't checked any of the details in the few paragraphs above, which are just meant to give an indication of how it all works and why you can do the whole thing in a single linear-algebra operation.

Solving for optimal alignment of 3d polygonal mesh

I'm trying to implement a geometry templating engine. One of the parts is taking a prototypical polygonal mesh and aligning an instantiation with some points in the larger object.
So, the problem is this: given 3d point positions for some (perhaps all) of the verts in a polygonal mesh, find a scaled rotation that minimizes the difference between the transformed verts and the given point positions. I also have a centerpoint that can remain fixed, if that helps. The correspondence between the verts and the 3d locations is fixed.
I'm thinking this could be done by solving for the coefficients of a transformation matrix, but I'm a little unsure how to build the system to solve.
An example of this is a cube. The prototype would be the unit cube, centered at the origin, with vert indices:
4----5
|\ \
| 6----7
| | |
0 | 1 |
\| |
2----3
An example of the vert locations to fit:
v0: 1.243,2.163,-3.426
v1: 4.190,-0.408,-0.485
v2: -1.974,-1.525,-3.426
v3: 0.974,-4.096,-0.485
v5: 1.974,1.525,3.426
v7: -1.243,-2.163,3.426
So, given that prototype and those points, how do I find the single scale factor, and the rotation about x, y, and z that will minimize the distance between the verts and those positions? It would be best for the method to be generalizable to an arbitrary mesh, not just a cube.
Assuming you have all points and their correspondences, you can fine-tune your match by solving the least squares problem:
minimize Norm(T*V-M)
where T is the transformation matrix you are looking for, V are the vertices to fit, and M are the vertices of the prototype. Norm refers to the Frobenius norm. M and V are 3xN matrices where each column is a 3-vector of a vertex of the prototype and corresponding vertex in the fitting vertex set. T is a 3x3 transformation matrix. Then the transformation matrix that minimizes the mean squared error is inverse(V*transpose(V))*V*transpose(M). The resulting matrix will in general not be orthogonal (you wanted one which has no shear), so you can solve a matrix Procrustes problem to find the nearest orthogonal matrix with the SVD.
Now, if you don't know which given points will correspond to which prototype points, the problem you want to solve is called surface registration. This is an active field of research. See for example this paper, which also covers rigid registration, which is what you're after.
If you want to create a mesh on an arbitrary 3D geometry, this is not the way it's typically done.
You should look at octree mesh generation techniques. You'll have better success if you work with a true 3D primitive, which means tetrahedra instead of cubes.
If your geometry is a 3D body, all you'll have is a surface description to start with. Determining "optimal" interior points isn't meaningful, because you don't have any. You'll want them to be arranged in such a way that the tetrahedra inside aren't too distorted, but that's the best you'll be able to do.