Kinect Fusion volume voxel settings? - kinect

I need some help trying to figure out the Volume Voxel Per Meter and Resolution settings in Kinect Fusion...mostly how, and if at all, they interact with Depth Threshold settings in the Kinect Fusion Explorer program please...because I don't get if the depth threshold minimum is increased and maximum is reduced, does that smaller range increases the overall precision of the scanned volume, or does it stay the same?
Say I set the Kinect Fusion's depth threshold minimum to 2m and the maximum to 3m, thus setting the scanned range to 3m-2m=1m, does then the volume voxels per meter setting of say 256 and a resolution of also 256 mean that I would get a voxel depth precision of 1m/256=0.003m=0.3cm (a third of a centimeter)? Or is the resolution applicable only to the complete Kinect depth range instead of the one set via depth threshold? Also, how's width and height affected by depth threshold settings, and how to calculate precision in those two remaining axis?
Thanks in advance
P.S.
If the volume voxel resolution is set to maximum for all three axis (768x768x768) what is the minimum amount of GPU memory needed to make Kinect Fusion work?

Answering an old topic; because there is no other answer:
A. Simple Answer:
Depth threshold settings simply decide what region of the depth map are you interested in. Any value below min depth threshold and above max depth threshold is simply replaced with 0 during depth map generation.
B. Detailed Answer:
Volume Voxel per meter: This is the mm value depth represented by a single voxels . So 1000mm/256 (voxel_per_meter) = ~3.9 mm/voxel
( See:PCL documentation )
Voxel Resolution: The number of voxels in the volume you are constructing.
So;
Voxel Resolution / Voxel per m = Volume of Reconstruction volume (in meters)
EG: 512 voxels / 256 vpm = 2.0m (The volume of the reconstruction cube, given that the number of voxels per side of the cube are the same - each axis can be independently defined.)
If you have the Kinect SDK installed; see the descriptions of the following variables:
minDepthClip = FusionDepthProcessor.DefaultMinimumDepth;
maxDepthClip = FusionDepthProcessor.DefaultMaximumDepth;
voxelsPerMeter; voxelsX; voxelsY; voxelsZ;
So; these values are not dependent (or vice versa) on the depth threshold value.
A good example of using the depth threshold values is in the great video by Daniel Shiffman ([Kinect & Processing])

Related

How to count peaks on chart in LabVIEW above some specific value. How to count amount of hills (Heart Rate Monitor)

I want to create some simple heart rate monitor in LabVIEW.
I have sensor which gives me heart workflow (upper graph): Waveform
On second graph (lower graph) is amount of hills (0 - valley, 1 - hill) and that hills are heart beats (that is voltage waveform). From this I want to get amount of those hills, then multiply this number by 6 and I'll get heart rate per minute.
Measuring card I use: NI USB-6009.
Any idea how to do that?
I can sent a VI file if anyone will be able to help me.
You could use Threshold Peak Detector VI
This VI does not identify the locations or the amplitudes of peaks
with great accuracy, but the VI does give an idea of where and how
often a signal crosses above a certain threshold value.
You could also use Waveform Peak Detection VI
The Waveform Peak Detection VI operates like the array-based Peak
Detector VI. The difference is that this VI's input is a waveform data
type, and the VI has error cluster input and output terminals.
Locations displays the output array of the peaks or valleys, which is
still in terms of the indices of the input waveform. For example, if
one element of Locations is 100, that means that there is a peak or
valley located at index 100 in the data array of the input waveform.
Figure 6 shows you a method for determining the times at which peaks
or valleys occur.
NI have a great tutorial that should answer all your questions, it can be found here:
I had some fun recreating some of your exercise here. I simulated a squarewave. In my sample of the square wave, I know how many samples I have and the sampling frequency. As a result, I calculate how much time my data sample represents. I then count the number of positive edges in the sample. I do some division to calculate beats/second and multiplication for beats/minute. The sampling frequency, Fs, and number of samples, N or #s are required to calculate your beats per minute metric. Their uses are shown below.
The contrived VI
Does that lead you to a solution for your application?

Adapting Smartphone Camera to derive Blackbody temperature

At first blush this presumably means -
(1) looking only at lower IR frequencies,
(2) select a IR frequency cut-off for low frequency buckets of the u/v FFT grid
(3) Once we have that, derive the power distribution - squares of amplitudes - for that IR range of frequency buckets the camera supports.
(4) Fit that distribution against the Rayleigh-Jones classical Black Box radiation formula:
(https://en.wikipedia.org/wiki/Rayleigh%E2%80%93Jeans_law#Other_forms_of_Rayleigh%E2%80%93Jeans_law)
(5) Assign a Temperature of 'best fit'.
The units for B(ν,T) are Power per unit frequency per unit surface area at equilibrium Temperature
Of course, this leaves many details out, such as (6) cancelling background, etc, but one could perhaps use the opposite facing camera to assist in that. Where buckets do not straddle the temperature of interest, (7) use a one-sided distribution to derive an inferred Gaussian curve to fit the Rayleigh-Jeans curve at that derived central frequency ν, for measured temperature T.
Finally (8) check if this procedure can consistently detect a high vs low surface temperature (9) check if it can consistently identify a 'fever' temperature (say, 101 Fahrenheit / 38 Celcius) pointing at a forehead.
If all that can be done, (10) Voila! a body fever detector
So those who are capable can fill us in on whether this is possible to do so for eventual posting at an app store as a free Covid19 safe body temperature app? I have a strong sense there's quite a few out there who can verify this in a week or two!
It appears that the analog signal assumed in (1) and (2) are not available in the Android digital Camera2 interface.
Android RAW image stream, that is uncompressed YUV, is already encoded Y green monochrome, and U,V are blue and red shifts from zero for converting green monochrome to color.
The original analog frequency / energy signal is not immediately accessible. So adaptation is not possible (yet).

Simulate Camera in Numpy

I have the task to simulate a camera with a full well capacity of 10.000 Photons per sensor element
in numpy. My first Idea was to do it like that:
camera = np.random.normal(0.0,1/10000,np.shape(img))
Imgwithnoise= img+camera
but it hardly shows an effect.
Has someone an idea how to do it?
From what I interpret from your question, if each physical pixel of the sensor has a 10,000 photon limit, this points to the brightest a digital pixel can be on your image. Similarly, 0 incident photons make the darkest pixels of the image.
You have to create a map from the physical sensor to the digital image. For the sake of simplicity, let's say we work with a grayscale image.
Your first task is to fix the colour bit-depth of the image. That is to say, is your image an 8-bit colour image? (Which usually is the case) If so, the brightest pixel has a brightness value = 255 (= 28 - 1, for 8 bits.) The darkest pixel is always chosen to have a value 0.
So you'd have to map from the range 0 --> 10,000 (sensor) to 0 --> 255 (image). The most natural idea would be to do a linear map (i.e. every pixel of the image is obtained by the same multiplicative factor from every pixel of the sensor), but to correctly interpret (according to the human eye) the brightness produced by n incident photons, often different transfer functions are used.
A transfer function in a simplified version is just a mathematical function doing this map - logarithmic TFs are quite common.
Also, since it seems like you're generating noise, it is unwise and conceptually wrong to add camera itself to the image img. What you should do, is fix a noise threshold first - this can correspond to the maximum number of photons that can affect a pixel reading as the maximum noise value. Then you generate random numbers (according to some distribution, if so required) in the range 0 --> noise_threshold. Finally, you use the map created earlier to add this noise to the image array.
Hope this helps and is in tune with what you wish to do. Cheers!

Frequency Range from FFT using vDSP

I have an array of values as input which can be plotted as follows…
Using vDSP_zvmagsD I get an array that I can plot as follows…
How do I get the frequency range that I need to label the x-axis?
The size of your frequency bins depend on the sampling rate of your input signal and the size of your FFT window:
input frequency / input samples = hz/frequency bin
Here you can find a more detailed answer by electronics stackexchange user Mark. There you will also find useful information about the tradeof between frequency resolution (bin size) and temporal resolution (when does which frequency occur)

VB FFT - stuck understanding relationship of results to frequency

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)
Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:
========================================
Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.
Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points
peak value in fft array is at index 6 (excluding a huge value at 0)
========================================
Sample rate is 8000 samples/s
Generate set of samples at 440Hz
Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points
peak value in fft array is at index 29 (excluding a huge value at 0)
========================================
How do I relate the index of the peak in the fft array to frequency ?
If you ignore the imaginary part, the frequency distribution is linear across bins:
Frequency#i = (Sampling rate/2)*(i/Nbins).
So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.
Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.
It would be helpful if you were to provide your sample dataset.
My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.
You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.
Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.
The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.
Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts
> sinx[1:100]
[1] 0.000000e+00 6.279052e-02 1.253332e-01 1.873813e-01 2.486899e-01 3.090170e-01 3.681246e-01 4.257793e-01 4.817537e-01 5.358268e-01
[11] 5.877853e-01 6.374240e-01 6.845471e-01 7.289686e-01 7.705132e-01 8.090170e-01 8.443279e-01 8.763067e-01 9.048271e-01 9.297765e-01
[21] 9.510565e-01 9.685832e-01 9.822873e-01 9.921147e-01 9.980267e-01 1.000000e+00 9.980267e-01 9.921147e-01 9.822873e-01 9.685832e-01
[31] 9.510565e-01 9.297765e-01 9.048271e-01 8.763067e-01 8.443279e-01 8.090170e-01 7.705132e-01 7.289686e-01 6.845471e-01 6.374240e-01
[41] 5.877853e-01 5.358268e-01 4.817537e-01 4.257793e-01 3.681246e-01 3.090170e-01 2.486899e-01 1.873813e-01 1.253332e-01 6.279052e-02
[51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
[61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
[71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
[81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
[91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02
And here is the resulting absolute values of the fft frequency domain
[1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.
If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)
bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.
The peaks at 0 indicates a DC offset.
It's been some time since I've done FFT's but here's what I remember
FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.
I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.
Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.
The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.
If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.
You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.
If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.
The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).
For example 1, you should get this:
alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg
For the other example you should get
alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg
So your answers both appear to be correct regarding the peak location.
What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.
Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.
Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.
The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.
Some pseudo code from Wikipedia:
s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
s = x[n] + coeff*s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;
The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)
In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.
WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.
When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!
-Rick