Getting 2 values of focal length when finding Intrinsic camera matrix (F not Fx,Fy)? - camera

The following image is the example that was given in my computer vision class. Now I cant understand why we are getting 2 unique values of f. I can understand if mxf and myf are different, but shouldn't the focal length 'f' be the same?

I believe you have an Fx and a Fy. This is so that the the matrix transforms on f can scale f in two directions x and y. IIRC this is why you get 2 f numbers

If really single f wanted, it should be modeled in the camera model used in calibration.
e.g. give the mx,my as constants to the camera model, and estimate the f.
However, perhaps the calibration process that obtained that K was not that way, but treated the two elements (K(0,0) and K(1,1)) independently.
In other words, mx and my were also estimated in the sense of dealing with the aspect ratio.
The estimation result is not the same as the values of mx and my calculated from the sensor specifications.
This is why you got 2 values.

Related

Mapping a numerical function with two inputs onto one with one input

I‘m quite bad at programming, so please bear with me. I‘m not even sure what the concept I need right now is called, so i don’t know what to google for or write in the title of this post.
My issue is, I numerically integrated a function on Mathematica and have a function F that depends on 2 inputs X and Y. Those inputs form a 2x2 grid. To visualize my solution, I would need a 3D graph.
Now I want to compare this to my analytical solution (/approximation) A, which I know only depends on one input Z, which is the ratio of X/Y. To visualize it, I only need a 2d Graph.
My issue now is, that I‘m not sure how to effectively filter that part of my numerical solution F so that I only consider the outputs with various ratios X/Y. This way, I could easily compare it to my analytical solution by only using a 2d graph.
I hope some of you understand my gibberish. I apologize for not being able to properly explain what I need in the correct language. I would be glad if some of you might be able to help me. Any help is appreciated.
Is my understanding correct? You have a numerically integrated function, F which maps a pair of numbers to a scalar:
F: (x,y) -> (z)
Then, there's another function, A, which takes a scalar and maps it to another scalar:
A: (b) -> (c)
and b is itself the ratio of x and y from before:
b = x/y
And you'd like to compare the outputs of F and A, i.e. compare z to c, as I've defined them here?
One thing you can do is sample the inputs to F that you already have, and then query A with the ratio of those inputs, and compare the output.
To put it another way, you can say, "for this x and this y, I know the output of F is this. Then, when I divide them and put them into A I get this."
Then, you could make a heatmap, say, where one of the axes is the x-value and the other axis is the y-value, and the color corresponds to F(x, y) - A(x/y)

Self-Attention Explainability of the Output Score Matrix

I am learning about attention models, and following along with Jay Alammar's amazing blog tutorial on The Illustrated Transformer. He gives a great walkthrough for how the attention scores are calculated, but I get a bit lost at a certain point, and am not seeing how the attention score Z matrix he explains is used to interpret strength of associations between different words within an input sequence.
He mentions that given some input matrix X, with shape N x D, where N is the number of elements in an input sequence, and D is the input dimensionality, we multiply X with three separate weight matrices of shape D x d, where d is some lower dimensionality that represents the projected space of the query, key, and value matrices:
The query and key matrices are dotted, and then divided by a scaling factor usually the square root of the projected dimensionality, and then run through a softmax function. This produces a weight matrix of size N x N, which is multiplied by the value matrix to get an output Z of shape N x d, which Jay says
That concludes the self-attention calculation. The resulting vector is
one we can send along to the feed-forward neural network.
The screenshot from his blog for this calculation is below:
However, this is where I'm confused. Z is N x d. However, I don't particularly understand what I'm supposed to do with this matrix from an interpretability sense, and as far as I understand, for a particular sequence element (ie. the word cats in the sequence I love pets, especially cats), self-attention is supposed to score other parts of the sequence high when it is relevant or strong associated with that word embedding. However, I'd expect then that Z is N x N, so I could say that I can select the Z[i,j] and say for the i-th word in the sequence, the j-th word relates or associates with it this or that much.
In fact, wouldn't it make much more sense to use only the softmax output of the weights (without multiplying them by the value matrix), since it already is N x N? In essence, how is Jay determining the strength of these associations in this particular sequence with the word it?
This is an N by 1 relationship he is showing - there are N values that correspond with the strength of association to the word it.

2D DCT coeficients meaning of a gray scale image

what are the DCT coefficients mean. And what is the difference between a positive and a negative DCT's coefficient for example coeficient 5 and -5.
Thanks
The DCT is simply a 1-to-1 transformation of the data.
Suppose you have a set of blueprints on paper. You scan them in. Once scanned they are crooked. You use Photoshop or something like it to rotate the image to its aligned to the edges and easier to work with.
The DCT is like a rotation in that it simply makes the image data easier to work with. I have to say that a lot of books make this confusing by adding spectral analysis mumbo-jumbo.
Desirable attributes of the DCT for this purpose are:
That it is a transformation to an orthonormal basis set. If D is the DCT transformation matrix, X is the input and Y is the output so that
X D = Y
Then there is an inverse matrix Q that gives:
Y Q = X
And Q is the transpose of D.
Therefore, it is just as easy to go forwards as it is to go backwards with the DCT.
The DCT transformation tends to concentrate the most important image data in one corner of the output matrix. The data at the opposite corner tends to be discardable without noticeably affecting photographic images.
As to your other question, the JPEG input pixels are translated to the range -127 to 128. Your starting values usually have negative values to it's no surprise that you get negative output values. Even if you did have all positive input values you could still get negative output values. There is no real significance between positive and negative values.

sampling 2-dimensional surface: how many sample points along X & Y axes?

I have a set of first 25 Zernike polynomials. Below are shown few in Cartesin co-ordinate system.
z2 = 2*x
z3 = 2*y
z4 = sqrt(3)*(2*x^2+2*y^2-1)
:
:
z24 = sqrt(14)*(15*(x^2+y^2)^2-20*(x^2+y^2)+6)*(x^2-y^2)
I am not using 1st since it is piston; so I have these 24 two-dim ANALYTICAL functions expressed in X-Y Cartesian co-ordinate system. All are defined over unit circle, as they are orthogonal over unit circle. The problem which I am describing here is relevant to other 2D surfaces also apart from Zernike Polynomials.
Suppose that origin (0,0) of the XY co-ordinate system and the centre of the unit circle are same.
Next, I take linear combination of these 24 polynomials to build a 2D wavefront shape. I use 24 random input coefficients in this combination.
w(x,y) = sum_over_i a_i*z_i (i=2,3,4,....24)
a_i = random coefficients
z_i = zernike polynomials
Upto this point, everything is analytical part which can be done on paper.
Now comes the discretization!
I know that when you want to re-construct a signal (1Dim/2Dim), your sampling frequency should be at least twice the maximum frequency present in the signal (Nyquist-Shanon principle).
Here signal is w(x,y) as mentioned above which is nothing but a simple 2Dim
function of x & y. I want to represent it on computer now. Obviously I can not take all infinite points from -1 to +1 along x axis and same for y axis.
I have to take finite no. of data points (which are called sample points or just samples) on this analytical 2Dim surface w(x,y)
I am measuring x & y in metres, and -1 <= x <= +1; -1 <= y <= +1.
e.g. If I divide my x-axis from -1 to 1, in 50 sample points then dx = 2/50= 0.04 metre. Same for y axis. Now my sampling frequency is 1/dx i.e. 25 samples per metre. Same for y axis.
But I took 50 samples arbitrarily; I could have taken 10 samples or 1000 samples. That is the crux of the matter here: how many samples points?How will I determine this number?
There is one theorem (Nyquist-Shanon theorem) mentioned above which says that if I want to re-construct w(x,y) faithfully, I must sample it on both axes so that my sampling frequency (i.e. no. of samples per metre) is at least twice the maximum frequency present in the w(x,y). This is nothing but finding power spectrum of w(x,y). Idea is that any function in space domain can be represented in spatial-frequency domain also, which is nothing but taking Fourier transform of the function! This tells us how many (spatial) frequencies are present in your function w(x,y) and what is the maximum frequency out of these many frequencies.
Now my question is first how to find out this maximum sampling frequency in my case. I can not use MATLAB fft2() or any other tool since it means already I have samples taken across the wavefront!! Obviously remaining option is find it analytically ! But that is time consuming and difficult since I have 24 polynomials & I will have to use then continuous Fourier transform i.e. I will have to go for pen and paper.
Any help will be appreciated.
Thanks
Key Assumptions
You want to use the "Nyquist-Shanon" theorem to determine sampling frequency
Obviously remaining option is find it analytically ! But that is time
consuming and difficult since I have 21 polynomials & I have to use
continuous Fourier transform i.e. done by analytically.
Given the assumption I have made (and noting that consideration of other mathematical techniques is out of scope for StackOverflow), you have no option but to calculate the continuous Fourier Transform.
However, I believe you haven't considered all the options for calculating the transform other than a laborious paper exercise e.g.
Numerical approximation of the continuous F.T. using code
Symbolic Integration e.g. Wolfram Alpha
Surely a numerical approximation of the Fourier Transform will be adequate for your solution?
I am assuming this is for coursework or research rather, so all you really care about as a physicist is a solution that is the quickest solution that is accurate within the scope of your problem.
So to conclude, IMHO, don't waste time searching for a more mathematically elegant solution or trick and just solve the problem with one of the above methods

Fitting curves to a set of points

Basically, I have a set of up to 100 co-ordinates, along with the desired tangents to the curve at the first and last point.
I have looked into various methods of curve-fitting, by which I mean an algorithm with takes the inputted data points and tangents, and outputs the equation of the cure, such as the gaussian method and interpolation, but I really struggled understanding them.
I am not asking for code (If you choose to give it, thats acceptable though :) ), I am simply looking for help into this algorithm. It will eventually be converted to Objective-C for an iPhone app, if that changes anything..
EDIT:
I know the order of all of the points. They are not too close together, so passing through all points is necessary - aka interpolation (unless anyone can suggest something else). And as far as I know, an algebraic curve is what I'm looking for. This is all being done on a 2D plane by the way
I'd recommend to consider cubic splines. There is some explanation and code to calculate them in plain C in Numerical Recipes book (chapter 3.3)
Most interpolation methods originally work with functions: given a set of x and y values, they compute a function which computes a y value for every x value, meeting the specified constraints. As a function can only ever compute a single y value for every x value, such an curve cannot loop back on itself.
To turn this into a real 2D setup, you want two functions which compute x resp. y values based on some parameter that is conventionally called t. So the first step is computing t values for your input data. You can usually get a good approximation by summing over euclidean distances: think about a polyline connecting all your points with straight segments. Then the parameter would be the distance along this line for every input pair.
So now you have two interpolation problem: one to compute x from t and the other y from t. You can formulate this as a spline interpolation, e.g. using cubic splines. That gives you a large system of linear equations which you can solve iteratively up to the desired precision.
The result of a spline interpolation will be a piecewise description of a suitable curve. If you wanted a single equation, then a lagrange interpolation would fit that bill, but the result might have odd twists and turns for many sets of input data.