Camera Calibration - Results Sufficient? - camera

I´m doing my camera calibration using the Caltech Toolbox
http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/example.html
and I´m not quite sure about the quality of my results. I had around 40 images in the beginning and dropped around 10 during the calibration due to large reprojection errors. I mounted the camera on a tripod and placed the checkerboard somewhere to avoid motion blur. I fixed the focal length to max.
The reprojection error looks fine I guess, but the uncertainties in the focal length, principal point and distortion are giving me a headache. Although the uncertainties given should refer to 3 times the standard deviation and therefore cover possible deviations with 99% (assuming a normal distribution), my results vary more than that. Here is what I get from the calibration:
1st attempt:
Focal Length: fc = [ 952.67300 952.58901 ] ± [ 3.18678 3.24121 ]
Principal point: cc = [ 641.33128 339.39042 ] ± [ 2.07428 2.53779 ]
kc = [ 0.16627 -0.28830 -0.00118 -0.00074 0.00000 ] ± [ 0.00554 0.00979 0.00093 0.00076 0.00000 ]
2nd attempt:
Focal Length: fc_left = [ 949.92127 946.43747 ] ± [ 4.75903 4.44547 ]
Principal point: cc_left = [ 642.39817 345.69787 ] ± [ 2.95598 4.19728 ]
kc = [ 0.13925 -0.23895 0.00141 -0.00062 0.00000 ] ± [ 0.00319 0.00490 0.00054 0.00041 0.00000 ]
3rd attempt:
Focal Length: fc = [ 949.55376 948.31960 ] ± [ 1.87647 1.73045 ]
Principal point: cc = [ 644.32264 342.15631 ± [ 1.19304 1.89943 ]
Distortion
kc = [ 0.15587 -0.26060 -0.00010 0.00018 0.00000 ] ± [ 0.00350 0.00612 0.00061 0.00044 0.00000 ]
The pixel error was for all three attempt roughly the same:
err = [ 0.24621 0.18013 ] (unfortunately I didn´t save the results)
My questions are:
What can I do to improve my results?
What is in general the best I can expect from the calibration (What should be the maximum uncertainty/reprojection error for a good calibration)?
Thanks!

I think your calibration looks good. It is normal that the focal length varies.
You can try to set the tangential distortion to zero. This is the usual case in contemporary cameras. There is even a calibration flag for that case CV_CALIB_ZERO_TANGENT_DIST.
In case you have a high-end camera and optics, you can place the principle point at the very center. For that you can use the flag CV_CALIB_FIX_PRINCIPAL_POINT. If I remember right, it's values are taken from the passed intrinsic parameters matrix.
In general, you get more accurate results when using a 3D calibration object instead of the planar checkerboard.

I used the caltech toolbox during my master thesis and I had better results using a good planar calibration grid.
Are you sure that the pattern you are using is really planar?
Is it a printed paper?
I suggest you first attach the calibration grid to a planar surface (a metal one would be the best solution, but also a wooden plate suffice) and then calibrate your system again.
In order to measure the calibration accuracy, you can measure the size (i.e. length) of a known object (for example, the length of a box of which you know the real measurement in meters) and compare the measured value with the real one.
Be aware also that you have to calibrate the whole space framed by your camera and that the target object to measure should be positioned in the same space volume that you just calibrated.
Of course a 3D calibration object would be the best solution, but as far as I know there is no off-the-shelf free code to manage it.

Related

Are the values for "rotation" and "translation" the values for extrinsic camera parameters in .gltf files?

I have exported a 3D scene in Blender as .gltf and I am reading the data in my program.
For the camera I have the following values in the .gltf File:
{
"camera" : 0,
"name" : "Camera",
"rotation" : [
0.331510511487034,
-0.018635762442412376,
0.0052512469701468945,
0.9450923238951721
],
"translation" : [
0.25607955169677734,
1.6810789010681152,
0.129119189865864
]
},
I think the values here for "rotation" and "translation" are the extrinsic camera parameters. The translation vector (x,y,z) makes sense to me, but I don't understand why there are only 4 floats for the camera rotation. In this case, there should be more values for the matrix, or am I missing something here? Thanks in advance!
When rotation is specified by itself, it's a quaternion, not a matrix. That's why you're seeing only 4 values there.
For reference, see: https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#transformations
The glTF camera object looks along -Z in local (node transformed) space, with +Y up.
See: https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#cameras

Netlogo gis extension: How to set world size to give specific scale to patches

I'm developing a epidemiological model using GIS data of a small town. One of the submodels in my model is about "infections": I need that an infected agent has a certain probability to infect other agents which are on his same patch.
In order to properly model this fact, I need my patches to have a specific area, for example 100 square meters. There is a way in which I can set the world size so that I am sure of the exact area a single patch is representing?
Thank you very much.
First of all, you might check the stackoverflow guide to asking questions. Having a minimal reproducible example also helps. Following the guidelines of Stackoverflow helps our community ;)
The way you define the patch scale with the GIS extensions is indeed not very clear. A good tutorial is available in Chapter 6 of this book.
First, have a raster file (e.g. .asc) with a defined resolution (e.g. 10 x 10 m). You can take a look on how to do this in QGIS and other GIS softwares. Make sure to export it to .asc and reproject it to your target SRC, otherwise you might fall in this problem.
Here's a simple code for you.
patches-own [ infectability ]
to setup-patches
; define a rasterfile:
set rastermap gis:load-dataset "C:/folder/yourfile.asc"
; define SRC:
gis:load-coordinate-system "C:/folder/yourfile.prj"
; make each raster cell = patch in NetLogo
let width floor (gis:width-of rastermap / 2)
let height floor (gis:height-of rastermap / 2)
resize-world (-1 * width ) width (-1 * height ) height
; define your patch size in pixels (makes your world size bigger/smaller in the Interface):
set-patch-size 1
; define world boundaries:
gis:set-world-envelope gis:envelope-of rastermap
; apply the raster data to your patches:
gis:apply-raster rastermap infectability
; make your patches look dangerous:
ask patches with [ infectability > 0.8 ] [ set pcolor red ]
end
After that, you will have to use some procedures making turtles ask patches to access the patch variable infectability. Good luck! ;)

numpy: Efficient lookup of multidimensional result for multidimensional key

To motivate the 'efficient' in the title: I am working with volumetric image data which can be up to 512x512x1000 pixels. So slow loops etc. are not really an option, particularly if the images need to be viewed in a GUI. Imagine sitting just 10s in front of a viewer waiting for images to load...
From two 3D input volumes x and y I calculate new 3D output volumes, currently up to three at a time e.g. by solving equation systems for each pixel. Since a lot of x,y combinations are actually repetitive and often even only a coherent meshgrid range is of interest, I am trying to speed up by creating a lookup table for this subregion. Works well, in my test case I need only ca. 3000 calculations instead of 30 million.
Now, to the problem: I am utterly failing at efficiently looking up the solutions of the 30 million x,y combinations from the 3000 solutions lookup table in a numpythonic way!
Let's try with an example:
# x y s1 s2
lookup = np.array([[ 4, 11, 23., 4. ],
[ 4, 12, 25., 13. ],
[ 5, 11, 21., 19. ],
[ 5, 12, 26., 56. ]])
I succeed in getting the index of one x,y pair following this post:
ii = np.where((lookup[:,0] == 4) & (lookup[:,1]==12))[0][0]
s1, s2 = lookup[ii,-2:]
print('At index',ii,':',s1,s2)
>>> At index 1 : 25.0 13.0
Q1: But how to vectorize this, i.e get full solutions arrays for the 30 million pixels?
s1, s2 = lookup[numpy_magic_with_xy, -2:]
Q2: And actually I'd like to set all solutions to zero for all x,y not within the region of interest. Where do I add that condition?
Q3: And what would really be the fastest way to achieve all this?
PS: I'm fine with using 1D image representations by working with x.ravel() etc. and reshaping at the end. Unless you tell me I don't need to and it's just slowing things down. Just doing it to still understand my own code I guess...

normal matrix for non uniform scaling

Im trying to calculate the normal matrix for my GLSL shaders on OpenGL 2.0.
The theory is : a normal matrix is the top left 3x3 matrix of the ModelView, transposed and inverted.
It seems to be correct as I have been rendering my scenes correctly, until I imported a model from maya and found non-uniform scales. Loaded models have a weird lighting, while my procedural ones are correct, so I put my money on the normal matrix calculation.
How is it computed with non uniform scale?
You already figured out that you need the transposed inverted matrix for transforming the normals. For a scaling matrix, that's easy to calculate.
A non-uniform 3x3 scaling matrix looks like this:
[ sx 0 0 ]
[ 0 sy 0 ]
[ 0 0 sz ]
with sx, sy and sz being the scaling factors for the 3 coordinate directions.
The inverse of this is:
[ 1 / sx 0 0 ]
[ 0 1 / sy 0 ]
[ 0 0 1 / sz ]
Transposing it changes nothing, so this is already your normal transformation matrix.
Note that, unlike for example a rotation, this transformation matrix will not keep vectors normalized when it is applied to a normalized vector. So after applying this matrix in your shader, you will have to re-normalize the result before using it for lighting calculations.
I would just like to add a practical example to Reto Koradi's answer.
Let's assume you already have a 4x4 model matrix and want to use it to transform normals as well. You can start by deducing scale in each axis by taking length of the 3 first columns of that matrix. If you now divide each column by its corresponding scaling factor, the matrix will no longer affect model's scale, because the basis vectors will have unit length.
As you pointed out, normals have to be scaled by the inverse of the scale in each axis. Fortunately, we have already derived the scale in the first step, so we can divide the columns again.
All that effectively means that if you want to derive transform matrix for normals from your model matrix, all you need to do is to divide each of its first three columns by their lengths squared (which can be rewritten as dot products). In GLSL you would write:
mat3 mat_n = mat3(mat_model);
mat_n[0] /= dot(mat_n[0], mat_n[0]);
mat_n[1] /= dot(mat_n[1], mat_n[1]);
mat_n[2] /= dot(mat_n[2], mat_n[2]);
vec3 new_normal = normalize(mat_n * normal);

VB FFT - stuck understanding relationship of results to frequency

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)
Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:
========================================
Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.
Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points
peak value in fft array is at index 6 (excluding a huge value at 0)
========================================
Sample rate is 8000 samples/s
Generate set of samples at 440Hz
Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points
peak value in fft array is at index 29 (excluding a huge value at 0)
========================================
How do I relate the index of the peak in the fft array to frequency ?
If you ignore the imaginary part, the frequency distribution is linear across bins:
Frequency#i = (Sampling rate/2)*(i/Nbins).
So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.
Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.
It would be helpful if you were to provide your sample dataset.
My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.
You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.
Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.
The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.
Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts
> sinx[1:100]
[1] 0.000000e+00 6.279052e-02 1.253332e-01 1.873813e-01 2.486899e-01 3.090170e-01 3.681246e-01 4.257793e-01 4.817537e-01 5.358268e-01
[11] 5.877853e-01 6.374240e-01 6.845471e-01 7.289686e-01 7.705132e-01 8.090170e-01 8.443279e-01 8.763067e-01 9.048271e-01 9.297765e-01
[21] 9.510565e-01 9.685832e-01 9.822873e-01 9.921147e-01 9.980267e-01 1.000000e+00 9.980267e-01 9.921147e-01 9.822873e-01 9.685832e-01
[31] 9.510565e-01 9.297765e-01 9.048271e-01 8.763067e-01 8.443279e-01 8.090170e-01 7.705132e-01 7.289686e-01 6.845471e-01 6.374240e-01
[41] 5.877853e-01 5.358268e-01 4.817537e-01 4.257793e-01 3.681246e-01 3.090170e-01 2.486899e-01 1.873813e-01 1.253332e-01 6.279052e-02
[51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
[61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
[71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
[81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
[91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02
And here is the resulting absolute values of the fft frequency domain
[1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.
If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)
bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.
The peaks at 0 indicates a DC offset.
It's been some time since I've done FFT's but here's what I remember
FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.
I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.
Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.
The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.
If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.
You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.
If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.
The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).
For example 1, you should get this:
alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg
For the other example you should get
alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg
So your answers both appear to be correct regarding the peak location.
What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.
Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.
Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.
The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.
Some pseudo code from Wikipedia:
s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
s = x[n] + coeff*s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;
The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)
In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.
WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.
When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!
-Rick