Where to start with Fourier Analysis - vb.net

I'm reading data from the microphone and want to perform some analysis on it. I'm attempting to generate a spectrum analyser something like this:
What I have at the moment is this:
My understanding is that I need to perform a Fourier analysis - a Fast Fourier Transform ? - to extract the component frequencies and their amplitudes.
Can someone confirm my understanding is correct and exactly what type of Fourier transform I need to apply?
At the moment, I'm getting frames containing 4k samples from the mic (using NAudio). The buffer I've got is 16bits/sample (Signed Short). For reference, the above plot shows approx half a frame
I'm coding in VB so any .Net libraries/examples (preferably on NuGet) would be of most use. I believe implementations vary considerably so the less I have to massage my data, the better.

The top plot is that of a spectrograph, where each vertical time line is colored based on the magnitudes of the result from an FFT (likely windowed) of a slice in time (possibly overlapped) of the input waveform. The number of vertical points to plot (the frequency resolution) is related to the length of the FFT. Almost any FFT will do. If you use the most common complex-to-complex FFT, just set the imaginary portion of each complex input sample to zero, copy a slice in time of samples of your input waveform to the "real" part, FFT, and take the magnitude or log magnitude of each complex result bin, then map these values to colors per your preference.

Related

numpy difference between fft and rfft

I'm trying to understand the difference between numpy fft and rfft. I've read the doc, and it only says rfft is meant for real inputs.
I've tested their performance on a large real array and found out rfft is faster than fft by about a third. My question is: why is rfft fast? Thanks!
An RFFT has half the degrees of freedom on the input, and half the number of complex outputs, compared to an FFT. Thus the FFT computation tree can be pruned to remove those adds and multiplies not needed for the non-existent inputs and/or those unnecessary since there are a lesser number of independant output values that need to be computed.
This is because an FFT of a strictly real input (e.g. all the input value imaginary components zero) produces a complex conjugate mirrored result, where each half can be trivially derived from the other half.

Interpolation from irregular grid to regular grid

I have some 1D data (time series data) that is sampled irregularly; i.e., non-constant sample rate. I would like transform these data into a regularly sampled (uniform sample rate) time series. I have used linear interpolation in an attempt to accomplish this; however, this is not very effective when there is a large variation in the time between samples. This is no surprise. I have also attempted some ad hoc methods that again are not very effective.
I have looked at several papers on the use of matching pursuit for interpolation over irregular grids; but, how this approach could be used to obtain samples over a regular grid is not clear to me (at least not yet).
I would appreciate any suggestions on algorithms for interpolation from irregular grids to regular grids (1D data).
If you want to fit the data points exactly, run
scipy.interpolate.UnivariateSpline
with s=0
(and ask further if that's not clear).

LabView cos fitting

I am working on a program that needs to fit numerous cosine waves in order to determine one of the parameters for the function. The equation that I am using is y = y_0 + Acos((4*pi*L)/x + pi) where L is the value that I am trying to obtain from the best fit line.
I know that it is possible to do this correctly by hand for each set of data, but what is the best way to automate this process? I am currently reading in the data from text files, and running a loop with the initial paramiters changing until I have an array of paramater values that have an amplitude similar to the data, then I check the percent difference between points on the center peak and two end peaks to try to pick the best one. It in consistently picking lower values than what I get when fitting by hand (almost exactly one phase off). So is there a way to improve this method, or another method that works better?
Edit: My LabVIEW version has a cos fitting VI which is what I am using, the problem is when I try to automate the fitting by changing the initial parameters using a loop, I cant figure out how to get the program to pick the same best fit line as a human would pick.
Why not just use a Fast Fourier Transform? This should be way faster than fitting a cosine. In the result vector of complex numbers look for the largest peak of in the totals. You're given frequency (position in the FFT result vector), amplitude and phase.
You can evaluate the goodness of the fit by computing the difference between fitting curve and your data. A VI does this in the "Advanced curve fitting" palette. Then all you have to do is pick up the best fit.

Optimizing interpolation in Mathematica

As part of my work, I often have to visualize complex 3 dimensional densities. One program suite that I work with outputs the radial component of the densities as a set of 781 points on a logarithmic grid, ri = (Rmax/Rstep)^((i-1)/(pts-1), times a spherical harmonic. For low symmetry systems, the number of spherical harmonics can be fairly large to ensure accuracy, e.g. one system requires 49 harmonics corresponding to lmax = 6. So, to use this data within Mathematica, I would have a sum of up to 49 interpolated functions with each multiplied by a different spherical harmonic. While using v.6 and constructing the interpolated radial functions using Interpolation and setting r = Sqrt(x^2 + y^2 + z^2), I would stop ContourPlot3D after well over an hour without anything displayed. This included reducing both the InterpolationOrder and MaxRecursion to 1.
Several alternatives presented themselves:
Evaluate the density function on a fixed grid, and use ListContourPlot instead.
Or, linearly spline the radial function and use Piecewise to stitch them together. (This presented itself, as I could use simplify to help reduce the complexity of the resulting function.)
I ended up using both, as InterpolatingFunction gives a noticeable delay in its evaluation, and with up to 49 interpolated functions to evaluate, any delay can become noticeable. Also, ContourPlot3D was faster with the spline, but it didn't give me the speed up I desired.
I'll freely admit that I haven't tried Interpolation on v.7, nor I have tried this on my upgraded hardware (G4 v. Intel Core i5). However, I'm looking for alternatives to my current scheme; preferably, one where I can use ContourPlot3D directly. I could try some other form of spline, such as a B-spline, and possibly combine that with UnitBox instead of using Piecewise.
Edit: Just to clarify, my current implementation involves creating a first order spline for each radial part, multiplying each one by their respective spherical harmonic, summing and Simplifying the equations on each radial interval, and then using Piecewise to bind them into one function. So, my implementation is semi-analytical in that the spherical harmonics are exact, and only the radial part is numerical. This is part of the reason why I would like to be able to use ContourPlot3D, so that I can take advantage of the semi-analytical nature of the data. As a point of note, the radial grid is fine enough that a good representation of the radial part is generated and can be smoothly interpolated. While this gave me a significant speed-up, when I wrote the code, it was still to slow for the hardware I was using at the time.
So, instead of using ContourPlot3D, I would first generate the function, as above, then I would evaluate it on an 803 Cartesian grid. It is the data from this step that I used in ListContourPlot3D. Since this is not an adaptive grid, in some places this was too course, and I was missing features.
If you can do without Mathematica, I would suggest you have a look at Paraview (US government funded FOSS, all platforms) which I have found to be superior to everything when it comes to visualizing massive amounts of data.
The core of the software is the "Visualization Toolkit" VTK, and you can find/write other frontends if need be.
VTK/Paraview can handle almost any data-type: scalar and vector on structured grids or random points, polygons, time-series data, etc. From Mathematica I often just dump grid data into VTK legacy format which in then simplest case looks like this
# vtk DataFile Version 2.0
Generated by mma via vtkGridDump
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 49 25 15
SPACING 0.125 0.125 0.0625
ORIGIN 8.5 5. 0.7124999999999999
POINT_DATA 18375
SCALARS RF_pondpot_1V1MHz1amu double 1
LOOKUP_TABLE default
0.04709501616121583
0.04135197485227461
... <18373 more numbers> ...
HTH!
If it really is the interpolation of the radial functions that is slowing you down, you could consider hand-coding that part based on your knowledge of the sample points. As demonstrated below, this gives a significant speedup:
I set things up with your notation. lookuprvals is a list of 100000 r values to look up for timing.
First, look at stock interpolation as a basemark
With[{interp=Interpolation[N#Transpose#{rvals,yvals}]},
Timing[interp[lookuprvals]][[1]]]
Out[259]= 2.28466
Switching to 0th-order interpolation is already an order of magnitude faster (first order is almost same speed):
With[{interp=Interpolation[N#Transpose#{rvals,yvals},InterpolationOrder->0]},
Timing[interp[lookuprvals]][[1]]]
Out[271]= 0.146486
We can get another 1.5 order of magnitude by calculating indices directly:
Module[{avg=MovingAverage[yvals,2],idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[res=Part[avg,Ceiling[idxfact Log[lookuprvals]]]][[1]]]
Out[272]= 0.006067
As a middle ground, do a log-linear interpolation by hand. This is slower than the above solution but still much faster than stock interpolation:
Module[{diffs=Differences[yvals],
idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[Block[{idxraw,idxfloor,idxrel},
idxraw=1+idxfact Log[lookuprvals];
idxfloor=Floor[idxraw];
idxrel=idxraw-idxfloor;
res=Part[yvals,idxfloor]+Part[diffs,idxfloor]idxrel
]][[1]]]
Out[276]= 0.026557
If you have the memory for it, I would cache the spherical harmonics and radius (or even radius-index) on the full grid. Then flatten the grid caches so you can do
Sum[ interpolate[yvals[lm],gridrvals] gridylmvals[lm], {lm,lmvals} ]
and recreate your grid as discussed here.

How to analyse 'noisiness' of an array of points

Have done fft (see earlier posting if you are interested!) and got a result, which helps me. Would like to analyse the noisiness / spikiness of an array (actually a vb.nre collection of single). Um, how to explain ...
When signal is good, fft power results is 512 data points (frequency buckets) with low values in all but maybe 2 or 3 array entries, and a decent range (i.e. the peak is high, relative to the noise value in the nearly empty buckets. So when graphed, we have a nice big spike in the values in those few buckets.
When signal is poor/noisy, data values spread (max to min) is low, and there's proportionally higher noise in many more buckets.
What's a good, computationally non-intensive was of analysing the noisiness of this data set? Would some kind of statistical method, standard deviations or something help ?
The key is defining what is noise and what is signal, for which modelling assumptions must be made. Often an assumption is made of white noise (constant power per frequency band) or noise of some other power spectrum, and that model is fitted to the data. The signal to noise ratio can then be used to measure the amount of noise.
Fitting a noise model depends on the nature of your data: if you know that the real signal will have no power in the high frequency components, you can look there for an indication of the noise level, and use the model to predict what the noise will be at the lower frequency components where there is both signal and noise. Alternatively, if your signal is constant in time, taking multiple FFTs at different points in time and comparing them to get a standard deviation for each frequency band can give the level of noise present.
I hope I'm not patronising you to mention the issues inherent with windowing functions when performing FFTs: these can have the effect of introducing spurious "noise" into the frequency spectrum which is in fact an artifact of the periodic nature of the FFT. There's a tradeoff between getting sharp peaks and 'sideband' noise - more here www.ee.iitm.ac.in/~nitin/_media/ee462/fftwindows.pdf
Calculate a standard deviation and then you decide the threshold that will indicate noise. In practice this is usually easy and allows you to easily tweak the "noise level" as needed.
There is a nice single pass stddev algorithm in Knuth. Here is link that describes an implementation.
Standard Deviation
calculate the signal to noise ratio
http://en.wikipedia.org/wiki/Signal-to-noise_ratio
you could also check the stdev for each point and if it's under some level you choose then the signal is good else it's not.
wouldn't the spike be
treated as a noise glitch in SNR, an
outlier to be discarded, as it were?
If it's clear from the time-domain data that there are such spikes, then they will certainly create a lot of noise in the frequency spectrum. Chosing to ignore them is a good idea, but unfortunately the FFT can't accept data with 'holes' in it where the spikes have been removed. There are two techniques to get around this. The 'dirty trick' method is to set the outlier sample to be the average of the two samples on either site, and compute the FFT with a full set of data.
The harder but more-correct method is to use a Lomb Normalised Periodogram (see the book 'Numerical Recipes' by W.H.Press et al.), which does a similar job to the FFT but can cope with missing data properly.