Getting frequency and amplitude from an audio file using FFT - so close but missing some vital insights, eli5? - kotlin

tl/dr: I've got two audio recordings of the same song without timestamps, and I'd like to align them. I believe FFT is the way to go, but while I've got a long way, it feels like I'm right on the edge of understanding enough to make it work, and would greatly benefit from a "you got this part wrong" advice on FFT. (My education never got into this area) So I came here seeking ELI5 help.
The journey:
Get two recordings at the same sample rate. (done!)
Transform them into a waveform. (DoubleArray) This doesn't keep any of the meta info like "samples/second" but the FFT math doesn't care until later.
Run a FFT on them using a simplified implementation for beginners
Get a Array<Frame>, each Frame contains Array<Bin>, each Bin has (amplitude, frequency) because the older implementation hid all the details (like frame width, and number of Bins, and ... stuff?) and outputs words I'm familiar with like "amplitude" and "frequency"
Try moving to a more robust FFT (Apache Commons)
Get an output of 'real' and 'imaginary' (uh oh)
Make the totally incorrect assumption that those were the same thing (amplitude and frequency). Surprise, they aren't!
Apache's FFT returns an Array<Complex> which means it... er... is just one frame's worth? And I should be chopping the song into 1 second chunks and passing each one into the FFT and call it multiple times? That seems strange, how does it get lower frequencies?
To the best of my understanding, the complex number is a way to convey the phase shift and amplitude in one neat container (and you need phase shift if you want to do the FFT in reverse). And the frequency is calculated from the index of the array.
Which works out to (pseudocode in Kotlin)
val audioFile = File("Dream_On.pcm")
val (phases, amplitudes) = AudioInputStream(
audioFile.inputStream(),
AudioFormat(
/* encoding = */ AudioFormat.Encoding.PCM_SIGNED,
/* sampleRate = */ 44100f,
/* sampleSizeInBits = */ 16,
/* channels = */ 2,
/* frameSize = */ 4,
/* frameRate = */ 44100f,
/* bigEndian = */ false
),
(audiFile.length() / /* frameSize */ 4)
).use { ais ->
val bytes = ais.readAllBytes()
val shorts = ShortArray(bytes.size / 2)
ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts)
val allWaveform = DoubleArray(shorts.size)
for (i in shorts.indices) {
allWaveform[i] = shorts[i].toDouble()
}
val halfwayThroughSong = allWaveform.size / 2
val moreThanOneSecond = allWaveform.copyOfRange(halfwayThroughSong, halfwayThroughSong + findNextPowerOf2(44100))
val fft = FastFourierTransformer(DftNormalization.STANDARD)
val fftResult: Array<Complex> = fft.transform(moreThanOneSecond, TransformType.FORWARD)
println("fftResult size: ${fftResult.size}")
val phases = DoubleArray(fftResult.size / 2)
val amplitudes = DoubleArray(fftResult.size / 2)
val frequencies = DoubleArray(fftResult.size / 2)
fftResult.filterIndexed { index, _ -> index < fftResult.size / 2 }.forEachIndexed { idx, complex ->
phases[idx] = atan2(complex.imaginary, complex.real)
frequencies[idx] = idx * 44100.0 / fftResult.size
amplitudes[idx] = hypot(complex.real, complex.imaginary)
}
Triple(phases, frequencies, amplitudes)
}
Is my step #8 at all close to the truth? Why would the FFT result return an array as big as my input number of samples? That makes me think I've got the "window" or "frame" part wrong.
I read up on
FFT real/imaginary/abs parts interpretation
Converting Real and Imaginary FFT output to Frequency and Amplitude
Java - Finding frequency and amplitude of audio signal using FFT

An audio recording in waveform is a series of sound energy levels, basically how much sound energy there should be at any one instant. Based on the bit rate, you can think of the whole recording as a graph of energy versus time.
Sound is made of waves, which have frequencies and amplitudes. Unless your recording is of a pure sine wave, it will have many different waves of sound coming and going, which summed together create the total sound that you experience over time. At any one instant of time, you have energy from many different waves added together. Some of those waves may be at their peaks, and some at their valleys, or anywhere in between.
An FFT is a way to convert energy-vs.-time data into amplitude-vs.-frequency data. The input to an FFT is a block of waveform. You can't just give it a single energy level from a one-dimensional point in time, because then there is no way to determine all the waves that add together to make up the amplitude at that point of time. So, you give it a series of amplitudes over some finite period of time.
The FFT then does its math and returns a range of complex numbers that represent the waves of sound over that chunk of time, that when added together would create the series of energy levels over that block of time. That's why the return value is an array. It represents a bunch of frequency ranges. Together the total data of the array represents the same energy from the input array.
You can calculate from the complex numbers both phase shift and amplitude for each frequency range represented in the return array.
Ultimately, I don’t see why performing an FFT would get you any closer to syncing your recordings. Admittedly it’s not a task I’ve tried before. But I would think waveform data is already the perfect form for comparing the data and finding matching patterns. If you break your songs up into chunks to perform FFTs on, then you can try to find matching FFTs but they will only match perfectly if your chunks are divided exactly along the same division points relative to the beginning of the original recording. And even if you could guarantee that and found matching FFT’s, you will only have as much precision as the size of your chunks.
But when I think of apps like Shazam, I realize they must be doing some sort of manipulation of the audio that breaks it down into something simpler for rapid comparison. That possibly involves some FFT manipulation and filtering.
Maybe you could compare FFTs using some algorithm to just find ones that are pretty similar to narrow down to a time range and then compare wave form data in that range to find the exact point of synchronization.

I would imagine the approach that would work well would to find the offset with the maximum cross-correlation between the two recordings. This means calculate the cross-correlation between the two pieces at various offsets. You would expect the maximum cross-correlation to occur at the offset where the two piece were best aligned.

Related

SuperCollider: automatic phase and frequency alignment of oscillators

Anyone has an idea for automatic phase and frequency alignment?
To explain: assume, you have an Impulse
in = Impulse.ar(Rand(2, 5), Rand(0, 1));
now I'd like to manipulate the frequency of another Impulse such that it adapts its phase and frequency to match the input.
Any suggestions, even for a google search are highly appreciated.
[question asked on behalf of a colleague]
I don't agree that this, as frame is a tough problem. Impulses are simple to track - that's why, for example, old rotary dial phones used pulse trains.
Here's some code that generates an impulse at a random frequency, then resynthesises another impulse at the same frequency. It also outputs a pitch estimate.
(
var left, right, master, slave, periodestimatebus, secretfrequency;
s = Server.default;
left = Bus.new(\audio, 0,1);
right = Bus.new(\audio, 1,1);
periodestimatebus = Bus.control(s,1);
//choose our secret frequency here for later comparison:
secretfrequency = rrand(2.0,5.0);
//generate impulse with secret frequency at some arbitrary phase
master = {Impulse.ar(secretfrequency, Rand(0, 1));}.play(s, left);
slave = {
var masterin, clockcount, clockoffset, syncedclock, periodestimate, tracking;
masterin = In.ar(left);
//This 1 Hz LFSaw is the "clock" against which we measure stuff
clockcount = LFSaw.ar(1, 0, 0.5, 0.5);
clockoffset = Latch.ar(clockcount, Delay1.ar(masterin));
syncedclock = (clockcount - clockoffset).frac;
//syncedclock is a version of the clock hard-reset (one sample after) every impulse trigger
periodestimate = Latch.ar(syncedclock, masterin);
//sanity-check our f impulse
Out.kr(periodestimatebus, periodestimate);
//there is no phase estimate per se - what would we measure it against? -
//but we can resynthesise a new impulse up to a 1 sample delay from the matched clock.
tracking = (Slope.ar(syncedclock)>0);
}.play(master, right, 0, addAction: \addAfter);
//Let's see how we performed
{
periodestimatebus.get({|periodestimate|
["actual/estimated frequency", secretfrequency, periodestimate.reciprocal].postln;
});
}.defer(1);
)
Notes to this code:
The periodestimate is generated by tricksy use of Delay1 to make sure that it samples the value of the clock just before it is reset. As such it is off by one sample.
The current implementation will produce a good period estimate with varying frequencies, down to 1Hz at least. Any lower and you'd need to change the clockcount clock to have a different frequency and tweak the arithmetic.
Many improvements are possible. For example, if you wish to track varying frequencies you might want to tweak it a little bit so that the resynthesized signal does not click too often as it underestimates the signal.
This is a tough problem, as you are dealing with a noisy source with a low frequency. If this were a sine wave I'd recommend a FFT, but FFTs don't do very well with noisy sources and low frequencies. It's still worth a shot. FFTs can match phase too. I believe you can use pitch.ar to help find the frequency.
The Chrip-Z algorithm is something you could use instead of the FFT -http://www.embedded.com/design/configurable-systems/4006427/A-DSP-algorithm-for-frequency-analysis
http://en.wikipedia.org/wiki/Bluestein%27s_FFT_algorithm
Another thing you could try is to use a neural net to try and guess it's way to the right information. You could use active training to help it achieve this goal. There is a very general discussion of this on SO:
Pitch detection using neural networks
One method some folks are coming around to is simulating the neurons of Cochlea to detect pitch.
You may want to read up on Phase-locked loops: http://en.wikipedia.org/wiki/Phase-locked_loop

Where to start with Fourier Analysis

I'm reading data from the microphone and want to perform some analysis on it. I'm attempting to generate a spectrum analyser something like this:
What I have at the moment is this:
My understanding is that I need to perform a Fourier analysis - a Fast Fourier Transform ? - to extract the component frequencies and their amplitudes.
Can someone confirm my understanding is correct and exactly what type of Fourier transform I need to apply?
At the moment, I'm getting frames containing 4k samples from the mic (using NAudio). The buffer I've got is 16bits/sample (Signed Short). For reference, the above plot shows approx half a frame
I'm coding in VB so any .Net libraries/examples (preferably on NuGet) would be of most use. I believe implementations vary considerably so the less I have to massage my data, the better.
The top plot is that of a spectrograph, where each vertical time line is colored based on the magnitudes of the result from an FFT (likely windowed) of a slice in time (possibly overlapped) of the input waveform. The number of vertical points to plot (the frequency resolution) is related to the length of the FFT. Almost any FFT will do. If you use the most common complex-to-complex FFT, just set the imaginary portion of each complex input sample to zero, copy a slice in time of samples of your input waveform to the "real" part, FFT, and take the magnitude or log magnitude of each complex result bin, then map these values to colors per your preference.

Simplification / optimization of GPS track

I've got a GPS track produced by gpxlogger(1) (supplied as a client for gpsd). GPS receiver updates its coordinates every 1 second, gpxlogger's logic is very simple, it writes down location (lat, lon, ele) and a timestamp (time) received from GPS every n seconds (n = 3 in my case).
After writing down a several hours worth of track, gpxlogger saves several megabyte long GPX file that includes several thousands of points. Afterwards, I try to plot this track on a map and use it with OpenLayers. It works, but several thousands of points make using the map a sloppy and slow experience.
I understand that having several thousands of points of suboptimal. There are myriads of points that can be deleted without losing almost anything: when there are several points making up roughly the straight line and we're moving with the same constant speed between them, we can just leave the first and the last point and throw away anything else.
I thought of using gpsbabel for such track simplification / optimization job, but, alas, it's simplification filter works only with routes, i.e. analyzing only geometrical shape of path, without timestamps (i.e. not checking that the speed was roughly constant).
Is there some ready-made utility / library / algorithm available to optimize tracks? Or may be I'm missing some clever option with gpsbabel?
Yes, as mentioned before, the Douglas-Peucker algorithm is a straightforward way to simplify 2D connected paths. But as you have pointed out, you will need to extend it to the 3D case to properly simplify a GPS track with an inherent time dimension associated with every point. I have done so for a web application of my own using a PHP implementation of Douglas-Peucker.
It's easy to extend the algorithm to the 3D case with a little understanding of how the algorithm works. Say you have input path consisting of 26 points labeled A to Z. The simplest version of this path has two points, A and Z, so we start there. Imagine a line segment between A and Z. Now scan through all remaining points B through Y to find the point furthest away from the line segment AZ. Say that point furthest away is J. Then, you scan the points between B and I to find the furthest point from line segment AJ and scan points K through Y to find the point furthest from segment JZ, and so on, until the remaining points all lie within some desired distance threshold.
This will require some simple vector operations. Logically, it's the same process in 3D as in 2D. If you find a Douglas-Peucker algorithm implemented in your language, it might have some 2D vector math implemented, and you'll need to extend those to use 3 dimensions.
You can find a 3D C++ implementation here: 3D Douglas-Peucker in C++
Your x and y coordinates will probably be in degrees of latitude/longitude, and the z (time) coordinate might be in seconds since the unix epoch. You can resolve this discrepancy by deciding on an appropriate spatial-temporal relationship; let's say you want to view one day of activity over a map area of 1 square mile. Imagining this relationship as a cube of 1 mile by 1 mile by 1 day, you must prescale the time variable. Conversion from degrees to surface distance is non-trivial, but for this case we simplify and say one degree is 60 miles; then one mile is .0167 degrees. One day is 86400 seconds; then to make the units equivalent, our prescale factor for your timestamp is .0167/86400, or about 1/5,000,000.
If, say, you want to view the GPS activity within the same 1 square mile map area over 2 days instead, time resolution becomes half as important, so scale it down twice further, to 1/10,000,000. Have fun.
Have a look at Ramer-Douglas-Peucker algorithm for smoothening complex polygons, also Douglas-Peucker line simplification algorithm can help you reduce your points.
OpenSource GeoKarambola java library (no Android dependencies but can be used in Android) that includes a GpxPathManipulator class that does both route & track simplification/reduction (3D/elevation aware).
If the points have timestamp information that will not be discarded.
https://sourceforge.net/projects/geokarambola/
This is the algorith in action, interactively
https://lh3.googleusercontent.com/-hvHFyZfcY58/Vsye7nVrmiI/AAAAAAAAHdg/2-NFVfofbd4ShZcvtyCDpi2vXoYkZVFlQ/w360-h640-no/movie360x640_05_82_05.gif
This algorithm is based on reducing the number of points by eliminating those that have the greatest XTD (cross track distance) error until a tolerated error is satisfied or the maximum number of points is reached (both parameters of the function), wichever comes first.
An alternative algorithm, for on-the-run stream like track simplification (I call it "streamplification") is:
you keep a small buffer of the points the GPS sensor gives you, each time a GPS point is added to the buffer (elevation included) you calculate the max XTD (cross track distance) of all the points in the buffer to the line segment that unites the first point with the (newly added) last point of the buffer. If the point with the greatest XTD violates your max tolerated XTD error (25m has given me great results) then you cut the buffer at that point, register it as a selected point to be appended to the streamplified track, trim the trailing part of the buffer up to that cut point, and keep going. At the end of the track the last point of the buffer is also added/flushed to the solution.
This algorithm is lightweight enough that it runs on an AndroidWear smartwatch and gives optimal output regardless of if you move slow or fast, or stand idle at the same place for a long time. The ONLY thing that maters is the SHAPE of your track. You can go for many minutes/kilometers and, as long as you are moving in a straight line (a corridor within +/- tolerated XTD error deviations) the streamplify algorithm will only output 2 points: those of the exit form last curve and entry on next curve.
I ran in to a similar issue. The rate at which the gps unit takes points is much larger that needed. Many of the points are not geographically far away from each other. The approach that I took is to calculate the distance between the points using the haversine formula. If the distance was not larger than my threshold (0.1 miles in my case) I threw away the point. This quickly gets the number of points down to a manageable size.
I don't know what language you are looking for. Here is a C# project that I was working on. At the bottom you will find the haversine code.
http://blog.bobcravens.com/2010/09/gps-using-the-netduino/
Hope this gets you going.
Bob
This is probably NP-hard. Suppose you have points A, B, C, D, E.
Let's try a simple deterministic algorithm. Suppose you calculate the distance from point B to line A-C and it's smaller than your threshold (1 meter). So you delete B. Then you try the same for C to line A-D, but it's bigger and D for C-E, which is also bigger.
But it turns out that the optimal solution is A, B, E, because point C and D are close to the line B-E, yet on opposite sides.
If you delete 1 point, you cannot be sure that it should be a point that you should keep, unless you try every single possible solution (which can be n^n in size, so on n=80 that's more than the minimum number of atoms in the known universe).
Next step: try a brute force or branch and bound algorithm. Doesn't scale, doesn't work for real-world size. You can safely skip this step :)
Next step: First do a determinstic algorithm and improve upon that with a metaheuristic algorithm (tabu search, simulated annealing, genetic algorithms). In java there are a couple of open source implementations, such as Drools Planner.
All in all, you 'll probably have a workable solution (although not optimal) with the first simple deterministic algorithm, because you only have 1 constraint.
A far cousin of this problem is probably the Traveling Salesman Problem variant in which the salesman cannot visit all cities but has to select a few.
You want to throw away uninteresting points. So you need a function that computes how interesting a point is, then you can compute how interesting all the points are and throw away the N least interesting points, where you choose N to slim the data set sufficiently. It sounds like your definition of interesting corresponds to high acceleration (deviation from straight-line motion), which is easy to compute.
Try this, it's free and opensource online Service:
https://opengeo.tech/maps/gpx-simplify-optimizer/
I guess you need to keep points where you change direction. If you split your track into the set of intervals of constant direction, you can leave only boundary points of these intervals.
And, as Raedwald pointed out, you'll want to leave points where your acceleration is not zero.
Not sure how well this will work, but how about taking your list of points, working out the distance between them and therefore the total distance of the route and then deciding on a resolution distance and then just linear interpolating the position based on each step of x meters. ie for each fix you have a "distance from start" measure and you just interpolate where n*x is for your entire route. (you could decide how many points you want and divide the total distance by this to get your resolution distance). On top of this you could add a windowing function taking maybe the current point +/- z points and applying a weighting like exp(-k* dist^2/accuracy^2) to get the weighted average of a set of points where dist is the distance from the raw interpolated point and accuracy is the supposed accuracy of the gps position.
One really simple method is to repeatedly remove the point that creates the largest angle (in the range of 0° to 180° where 180° means it's on a straight line between its neighbors) between its neighbors until you have few enough points. That will start off removing all points that are perfectly in line with their neighbors and will go from there.
You can do that in Ο(n log(n)) by making a list of each index and its angle, sorting that list in descending order of angle, keeping how many you need from the front of the list, sorting that shorter list in descending order of index, and removing the indexes from the list of points.
def simplify_points(points, how_many_points_to_remove)
angle_map = Array.new
(2..points.length - 1).each { |next_index|
removal_list.add([next_index - 1, angle_between(points[next_index - 2], points[next_index - 1], points[next_index])])
}
removal_list = removal_list.sort_by { |index, angle| angle }.reverse
removal_list = removal_list.first(how_many_points_to_remove)
removal_list = removal_list.sort_by { |index, angle| index }.reverse
removal_list.each { |index| points.delete_at(index) }
return points
end

How to calculate deceleration needed to reach a certain speed over a certain distance?

I've tried the typical physics equations for this but none of them really work because the equations deal with constant acceleration and mine will need to change to work correctly. Basically I have a car that can be going at a large range of speeds and needs to slow down and stop over a given distance and time as it reaches the end of its path.
So, I have:
V0, or the current speed
Vf, or the speed I want to reach (typically 0)
t, or the amount of time I want to take to reach the end of my path
d, or the distance I want to go as I change from V0 to Vf
I want to calculate
a, or the acceleration needed to go from V0 to Vf
The reason this becomes a programming-specific question is because a needs to be recalculated every single timestep as the car keeps stopping. So, V0 constantly is changed to be V0 from last timestep plus the a that was calculated last timestep. So essentially it will start stopping slowly then will eventually stop more abruptly, sort of like a car in real life.
EDITS:
All right, thanks for the great responses. A lot of what I needed was just some help thinking about this. Let me be more specific now that I've got some more ideas from you all:
I have a car c that is 64 pixels from its destination, so d=64. It is driving at 2 pixels per timestep, where a timestep is 1/60 of a second. I want to find the acceleration a that will bring it to a speed of 0.2 pixels per timestep by the time it has traveled d.
d = 64 //distance
V0 = 2 //initial velocity (in ppt)
Vf = 0.2 //final velocity (in ppt)
Also because this happens in a game loop, a variable delta is passed through to each action, which is the multiple of 1/60s that the last timestep took. In other words, if it took 1/60s, then delta is 1.0, if it took 1/30s, then delta is 0.5. Before acceleration is actually applied, it is multiplied by this delta value. Similarly, before the car moves again its velocity is multiplied by the delta value. This is pretty standard stuff, but it might be what is causing problems with my calculations.
Linear acceleration a for a distance d going from a starting speed Vi to a final speed Vf:
a = (Vf*Vf - Vi*Vi)/(2 * d)
EDIT:
After your edit, let me try and gauge what you need...
If you take this formula and insert your numbers, you get a constant acceleration of -0,0309375. Now, let's keep calling this result 'a'.
What you need between timestamps (frames?) is not actually the acceleration, but new location of the vehicle, right? So you use the following formula:
Sd = Vi * t + 0.5 * t * t * a
where Sd is the current distance from the start position at current frame/moment/sum_of_deltas, Vi is the starting speed, and t is the time since the start.
With this, your decceleration is constant, but even if it is linear, your speed will accomodate to your constraints.
If you want a non-linear decceleration, you could find some non-linear interpolation method, and interpolate not acceleration, but simply position between two points.
location = non_linear_function(time);
The four constraints you give are one too many for a linear system (one with constant acceleration), where any three of the variables would suffice to compute the acceleration and thereby determine the fourth variables. However, the system is way under-specified for a completely general nonlinear system -- there may be uncountably infinite ways to change acceleration over time while satisfying all the constraints as given. Can you perhaps specify better along what kind of curve acceleration should change over time?
Using 0 index to mean "at the start", 1 to mean "at the end", and D for Delta to mean "variation", given a linearly changing acceleration
a(t) = a0 + t * (a1-a0)/Dt
where a0 and a1 are the two parameters we want to compute to satisfy all the various constraints, I compute (if there's been no misstep, as I did it all by hand):
DV = Dt * (a0+a1)/2
Ds = Dt * (V0 + ((a1-a0)/6 + a0/2) * Dt)
Given DV, Dt and Ds are all given, this leaves 2 linear equations in the unknowns a0 and a1 so you can solve for these (but I'm leaving things in this form to make it easier to double check on my derivations!!!).
If you're applying the proper formulas at every step to compute changes in space and velocity, it should make no difference whether you compute a0 and a1 once and for all or recompute them at every step based on the remaining Dt, Ds and DV.
If you're trying to simulate a time-dependent acceleration in your equations, it just means that you should assume that. You have to integrate F = ma along with the acceleration equations, that's all. If acceleration isn't constant, you just have to solve a system of equations instead of just one.
So now it's really three vector equations that you have to integrate simultaneously: one for each component of displacement, velocity, and acceleration, or nine equations in total. The force as a function of time will be an input for your problem.
If you're assuming 1D motion you're down to three simultaneous equations. The ones for velocity and displacement are both pretty easy.
In real life, a car's stopping ability depends on the pressure on the brake pedal, any engine braking that's going on, surface conditions, and such: also, there's that "grab" at the end when the car really stops. Modeling that is complicated, and you're unlikely to find good answers on a programming website. Find some automotive engineers.
Aside from that, I don't know what you're asking for. Are you trying to determine a braking schedule? As in there's a certain amount of deceleration while coasting, and then applying the brake? In real driving, the time is not usually considered in these maneuvers, but rather the distance.
As far as I can tell, your problem is that you aren't asking for anything specific, which suggests that you really haven't figured out what you actually want. If you'd provide a sample use for this, we could probably help you. As it is, you've provided the bare bones of a problem that is either overdetermined or way underconstrained, and there's really nothing we can do with that.
if you need to go from 10m/s to 0m/s in 1m with linear acceleration you need 2 equations.
first find the time (t) it takes to stop.
v0 = initial velocity
vf = final velocity
x0 = initial displacement
xf = final displacement
a = constant linear acceleration
(xf-x0)=.5*(v0-vf)*t
t=2*(xf-x0)/(v0-vf)
t=2*(1m-0m)/(10m/s-0m/s)
t=.2seconds
next to calculate the linear acceleration between x0 & xf
(xf-x0)=(v0-vf)*t+.5*a*t^2
(1m-0m)=(10m/s-0m/s)*(.2s)+.5*a*((.2s)^2)
1m=(10m/s)*(.2s)+.5*a*(.04s^2)
1m=2m+a*(.02s^2)
-1m=a*(.02s^2)
a=-1m/(.02s^2)
a=-50m/s^2
in terms of gravity (g's)
a=(-50m/s^2)/(9.8m/s^2)
a=5.1g over the .2 seconds from 0m to 10m
Problem is either overconstrained or underconstrained (a is not constant? is there a maximum a?) or ambiguous.
Simplest formula would be a=(Vf-V0)/t
Edit: if time is not constrained, and distance s is constrained, and acceleration is constant, then the relevant formulae are s = (Vf+V0)/2 * t, t=(Vf-V0)/a which simplifies to a = (Vf2 - V02) / (2s).

VB FFT - stuck understanding relationship of results to frequency

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)
Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:
========================================
Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.
Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points
peak value in fft array is at index 6 (excluding a huge value at 0)
========================================
Sample rate is 8000 samples/s
Generate set of samples at 440Hz
Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points
peak value in fft array is at index 29 (excluding a huge value at 0)
========================================
How do I relate the index of the peak in the fft array to frequency ?
If you ignore the imaginary part, the frequency distribution is linear across bins:
Frequency#i = (Sampling rate/2)*(i/Nbins).
So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.
Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.
It would be helpful if you were to provide your sample dataset.
My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.
You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.
Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.
The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.
Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts
> sinx[1:100]
[1] 0.000000e+00 6.279052e-02 1.253332e-01 1.873813e-01 2.486899e-01 3.090170e-01 3.681246e-01 4.257793e-01 4.817537e-01 5.358268e-01
[11] 5.877853e-01 6.374240e-01 6.845471e-01 7.289686e-01 7.705132e-01 8.090170e-01 8.443279e-01 8.763067e-01 9.048271e-01 9.297765e-01
[21] 9.510565e-01 9.685832e-01 9.822873e-01 9.921147e-01 9.980267e-01 1.000000e+00 9.980267e-01 9.921147e-01 9.822873e-01 9.685832e-01
[31] 9.510565e-01 9.297765e-01 9.048271e-01 8.763067e-01 8.443279e-01 8.090170e-01 7.705132e-01 7.289686e-01 6.845471e-01 6.374240e-01
[41] 5.877853e-01 5.358268e-01 4.817537e-01 4.257793e-01 3.681246e-01 3.090170e-01 2.486899e-01 1.873813e-01 1.253332e-01 6.279052e-02
[51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
[61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
[71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
[81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
[91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02
And here is the resulting absolute values of the fft frequency domain
[1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.
If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)
bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.
The peaks at 0 indicates a DC offset.
It's been some time since I've done FFT's but here's what I remember
FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.
I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.
Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.
The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.
If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.
You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.
If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.
The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).
For example 1, you should get this:
alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg
For the other example you should get
alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg
So your answers both appear to be correct regarding the peak location.
What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.
Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.
Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.
The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.
Some pseudo code from Wikipedia:
s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
s = x[n] + coeff*s_prev - s_prev2;
s_prev2 = s_prev;
s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;
The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)
In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.
WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.
When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!
-Rick