Anyone has an idea for automatic phase and frequency alignment?
To explain: assume, you have an Impulse
in = Impulse.ar(Rand(2, 5), Rand(0, 1));
now I'd like to manipulate the frequency of another Impulse such that it adapts its phase and frequency to match the input.
Any suggestions, even for a google search are highly appreciated.
[question asked on behalf of a colleague]
I don't agree that this, as frame is a tough problem. Impulses are simple to track - that's why, for example, old rotary dial phones used pulse trains.
Here's some code that generates an impulse at a random frequency, then resynthesises another impulse at the same frequency. It also outputs a pitch estimate.
(
var left, right, master, slave, periodestimatebus, secretfrequency;
s = Server.default;
left = Bus.new(\audio, 0,1);
right = Bus.new(\audio, 1,1);
periodestimatebus = Bus.control(s,1);
//choose our secret frequency here for later comparison:
secretfrequency = rrand(2.0,5.0);
//generate impulse with secret frequency at some arbitrary phase
master = {Impulse.ar(secretfrequency, Rand(0, 1));}.play(s, left);
slave = {
var masterin, clockcount, clockoffset, syncedclock, periodestimate, tracking;
masterin = In.ar(left);
//This 1 Hz LFSaw is the "clock" against which we measure stuff
clockcount = LFSaw.ar(1, 0, 0.5, 0.5);
clockoffset = Latch.ar(clockcount, Delay1.ar(masterin));
syncedclock = (clockcount - clockoffset).frac;
//syncedclock is a version of the clock hard-reset (one sample after) every impulse trigger
periodestimate = Latch.ar(syncedclock, masterin);
//sanity-check our f impulse
Out.kr(periodestimatebus, periodestimate);
//there is no phase estimate per se - what would we measure it against? -
//but we can resynthesise a new impulse up to a 1 sample delay from the matched clock.
tracking = (Slope.ar(syncedclock)>0);
}.play(master, right, 0, addAction: \addAfter);
//Let's see how we performed
{
periodestimatebus.get({|periodestimate|
["actual/estimated frequency", secretfrequency, periodestimate.reciprocal].postln;
});
}.defer(1);
)
Notes to this code:
The periodestimate is generated by tricksy use of Delay1 to make sure that it samples the value of the clock just before it is reset. As such it is off by one sample.
The current implementation will produce a good period estimate with varying frequencies, down to 1Hz at least. Any lower and you'd need to change the clockcount clock to have a different frequency and tweak the arithmetic.
Many improvements are possible. For example, if you wish to track varying frequencies you might want to tweak it a little bit so that the resynthesized signal does not click too often as it underestimates the signal.
This is a tough problem, as you are dealing with a noisy source with a low frequency. If this were a sine wave I'd recommend a FFT, but FFTs don't do very well with noisy sources and low frequencies. It's still worth a shot. FFTs can match phase too. I believe you can use pitch.ar to help find the frequency.
The Chrip-Z algorithm is something you could use instead of the FFT -http://www.embedded.com/design/configurable-systems/4006427/A-DSP-algorithm-for-frequency-analysis
http://en.wikipedia.org/wiki/Bluestein%27s_FFT_algorithm
Another thing you could try is to use a neural net to try and guess it's way to the right information. You could use active training to help it achieve this goal. There is a very general discussion of this on SO:
Pitch detection using neural networks
One method some folks are coming around to is simulating the neurons of Cochlea to detect pitch.
You may want to read up on Phase-locked loops: http://en.wikipedia.org/wiki/Phase-locked_loop
Related
tl/dr: I've got two audio recordings of the same song without timestamps, and I'd like to align them. I believe FFT is the way to go, but while I've got a long way, it feels like I'm right on the edge of understanding enough to make it work, and would greatly benefit from a "you got this part wrong" advice on FFT. (My education never got into this area) So I came here seeking ELI5 help.
The journey:
Get two recordings at the same sample rate. (done!)
Transform them into a waveform. (DoubleArray) This doesn't keep any of the meta info like "samples/second" but the FFT math doesn't care until later.
Run a FFT on them using a simplified implementation for beginners
Get a Array<Frame>, each Frame contains Array<Bin>, each Bin has (amplitude, frequency) because the older implementation hid all the details (like frame width, and number of Bins, and ... stuff?) and outputs words I'm familiar with like "amplitude" and "frequency"
Try moving to a more robust FFT (Apache Commons)
Get an output of 'real' and 'imaginary' (uh oh)
Make the totally incorrect assumption that those were the same thing (amplitude and frequency). Surprise, they aren't!
Apache's FFT returns an Array<Complex> which means it... er... is just one frame's worth? And I should be chopping the song into 1 second chunks and passing each one into the FFT and call it multiple times? That seems strange, how does it get lower frequencies?
To the best of my understanding, the complex number is a way to convey the phase shift and amplitude in one neat container (and you need phase shift if you want to do the FFT in reverse). And the frequency is calculated from the index of the array.
Which works out to (pseudocode in Kotlin)
val audioFile = File("Dream_On.pcm")
val (phases, amplitudes) = AudioInputStream(
audioFile.inputStream(),
AudioFormat(
/* encoding = */ AudioFormat.Encoding.PCM_SIGNED,
/* sampleRate = */ 44100f,
/* sampleSizeInBits = */ 16,
/* channels = */ 2,
/* frameSize = */ 4,
/* frameRate = */ 44100f,
/* bigEndian = */ false
),
(audiFile.length() / /* frameSize */ 4)
).use { ais ->
val bytes = ais.readAllBytes()
val shorts = ShortArray(bytes.size / 2)
ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts)
val allWaveform = DoubleArray(shorts.size)
for (i in shorts.indices) {
allWaveform[i] = shorts[i].toDouble()
}
val halfwayThroughSong = allWaveform.size / 2
val moreThanOneSecond = allWaveform.copyOfRange(halfwayThroughSong, halfwayThroughSong + findNextPowerOf2(44100))
val fft = FastFourierTransformer(DftNormalization.STANDARD)
val fftResult: Array<Complex> = fft.transform(moreThanOneSecond, TransformType.FORWARD)
println("fftResult size: ${fftResult.size}")
val phases = DoubleArray(fftResult.size / 2)
val amplitudes = DoubleArray(fftResult.size / 2)
val frequencies = DoubleArray(fftResult.size / 2)
fftResult.filterIndexed { index, _ -> index < fftResult.size / 2 }.forEachIndexed { idx, complex ->
phases[idx] = atan2(complex.imaginary, complex.real)
frequencies[idx] = idx * 44100.0 / fftResult.size
amplitudes[idx] = hypot(complex.real, complex.imaginary)
}
Triple(phases, frequencies, amplitudes)
}
Is my step #8 at all close to the truth? Why would the FFT result return an array as big as my input number of samples? That makes me think I've got the "window" or "frame" part wrong.
I read up on
FFT real/imaginary/abs parts interpretation
Converting Real and Imaginary FFT output to Frequency and Amplitude
Java - Finding frequency and amplitude of audio signal using FFT
An audio recording in waveform is a series of sound energy levels, basically how much sound energy there should be at any one instant. Based on the bit rate, you can think of the whole recording as a graph of energy versus time.
Sound is made of waves, which have frequencies and amplitudes. Unless your recording is of a pure sine wave, it will have many different waves of sound coming and going, which summed together create the total sound that you experience over time. At any one instant of time, you have energy from many different waves added together. Some of those waves may be at their peaks, and some at their valleys, or anywhere in between.
An FFT is a way to convert energy-vs.-time data into amplitude-vs.-frequency data. The input to an FFT is a block of waveform. You can't just give it a single energy level from a one-dimensional point in time, because then there is no way to determine all the waves that add together to make up the amplitude at that point of time. So, you give it a series of amplitudes over some finite period of time.
The FFT then does its math and returns a range of complex numbers that represent the waves of sound over that chunk of time, that when added together would create the series of energy levels over that block of time. That's why the return value is an array. It represents a bunch of frequency ranges. Together the total data of the array represents the same energy from the input array.
You can calculate from the complex numbers both phase shift and amplitude for each frequency range represented in the return array.
Ultimately, I don’t see why performing an FFT would get you any closer to syncing your recordings. Admittedly it’s not a task I’ve tried before. But I would think waveform data is already the perfect form for comparing the data and finding matching patterns. If you break your songs up into chunks to perform FFTs on, then you can try to find matching FFTs but they will only match perfectly if your chunks are divided exactly along the same division points relative to the beginning of the original recording. And even if you could guarantee that and found matching FFT’s, you will only have as much precision as the size of your chunks.
But when I think of apps like Shazam, I realize they must be doing some sort of manipulation of the audio that breaks it down into something simpler for rapid comparison. That possibly involves some FFT manipulation and filtering.
Maybe you could compare FFTs using some algorithm to just find ones that are pretty similar to narrow down to a time range and then compare wave form data in that range to find the exact point of synchronization.
I would imagine the approach that would work well would to find the offset with the maximum cross-correlation between the two recordings. This means calculate the cross-correlation between the two pieces at various offsets. You would expect the maximum cross-correlation to occur at the offset where the two piece were best aligned.
I'm trying to track the distance a user has moved over time in my application using the GPS. I have the basic idea in place, so I store the previous location and when a new GPS location is sent I calculate the distance between them, and add that to the total distance. So far so good.
There are two big issues with this simple implementation:
Since the GPS is inacurate, when the user moves, the GPS points will not be a straight line but more of a "zig zag" pattern making it look like the user has moved longer than he actually have moved.
Also a accuracy problem. If the phone just lays on the table and polls GPS possitions, the answer is usually a couple of meters different every time, so you see the meters start accumulating even when the phone is laying still.
Both of these makes the tracking useless of coruse, since the number I'm providing is nowwhere near accurate enough.
But I guess that this problem is solvable since there are a lot of fitness trackers and similar out there that does track distance from GPS. I guess they do some kind of interpolation between the GPS values or something like that? I guess that won't be 100% accurate either, but probably good enough for my usage.
So what I'm after is basically a algorithm where I can put in my GPS positions, and get as good approximation of distance travelled as possible.
Note that I cannot presume that the user will follow roads, so I cannot use the Google Distance Matrix API or similar for this.
This is a common problem with the position data that is produced by GPS receivers. A typical consumer grade receiver that I have used has a position accuracy defined as a CEP of 2.5 metres. This means that for a stationary receiver in a "perfect" sky view environment over time 50% of the position fixes will lie within a circle with a radius of 2.5 metres. If you look at the position that the receiver reports it appears to wander at random around the true position sometimes moving a number of metres away from its true location. If you simply integrate the distance moved between samples then you will get a very large apparent distance travelled.for a stationary device.
A simple algorithm that I have used quite successfully for a vehicle odometer function is as follows
for(;;)
{
Stored_Position = Current_Position ;
do
{
Distance_Moved = Distance_Between( Current_Position, Stored_Position ) ;
} while ( Distance_Moved < MOVEMENT_THRESHOLD ) ;
Cumulative_Distance += Distance_Moved ;
}
The value of MOVEMENT_THRESHOLD will have an effect on the accuracy of the final result. If the value is too small then some of the random wandering performed by the stationary receiver will be included in the final result. If the value is too large then the path taken will be approximated to a series of straight lines each of which is as long as the threshold value. The extra distance travelled by the receiver as its path deviates from this straight line segment will be missed.
The accuracy of this approach, when compared with the vehicle odometer, was pretty good. How well it works with a pedestrian would have to be tested. The problem with people is that they can make much sharper turns than a vehicle resulting in larger errors from the straight line approximation. There is also the perennial problem with sky view obscuration and signal multipath caused by buildings, vehicles etc. that can induce positional errors of 10s of metres.
Can anybody spot what is wrong with the code below. It is supposed to average the frame interval (dt) for the previous TIME_STEPS number of frames.
I'm using Box2d and cocos2d, although I don't think the cocos2d bit is very relevent.
-(void) update: (ccTime) dt
{
float32 timeStep;
const int32 velocityIterations = 8;
const int32 positionIterations = 3;
// Average the previous TIME_STEPS time steps
for (int i = 0; i < TIME_STEPS; i++)
{
timeStep += previous_time_steps[i];
}
timeStep = timeStep/TIME_STEPS;
// step the world
[GB2Engine sharedInstance].world->Step(timeStep, velocityIterations, positionIterations);
for (int i = 0; i < TIME_STEPS - 1; i++)
{
previous_time_steps[i] = previous_time_steps[i+1];
}
previous_time_steps[TIME_STEPS - 1] = dt;
}
The previous_time_steps array is initially filled with whatever the animation interval is set too.
This doesn't do what I would expect it too. On devices with a low frame rate it speeds up the simulation and on devices with a high frame rate it slows it down. I'm sure it's something stupid I'm over looking.
I know box2D likes to work with fixed times steps but I really don't have a choice. My game runs at a very variable frame rate on the various devices so a fixed time stop just won't work. The game runs at an average of 40 fps, but on some of the crappier devices like the first gen iPad it runs at barely 30 frames per second. The third gen ipad runs it at 50/60 frames per second.
I'm open to suggestion on other ways of dealing with this problem too. Any advice would be appreciated.
Something else unusual I should note that somebody might have some insight into is the fact that running any debug optimisations on the build has a huge effect on the above. The frame rate isn't changed much when debug optimisations are set to -Os vs -O0. But when the debut optimisations are set to -Os the physics simulation runs much faster than -O0 when the above code is active. If I just use dt as the interval instead of the above code then the debug optimisations make no difference.
I'm totally confused by that.
On devices with a low frame rate it speeds up the simulation and on
devices with a high frame rate it slows it down.
That's what using a variable time step is all about. If you only get 10 fps the physics engine will iterate the world faster because the delta time is larger.
PS: If you do any kind of performance tests like these, run them with the release build. That also ensures that (most) logging is disabled and code optimizations are on. It's possible that you simply experience much greater impact on performance from debugging code on older devices.
Also, what value is TIME_STEPS? It shouldn't be more than 10, maybe 20 at most. The alternative to averaging is to use delta time directly, but if delta time is greater than a certain threshold (30 fps) switch to using a fixed delta time (cap it). Because variable time step below 30 fps can get really ugly, it's probably better in such cases to allow the physics engine to slow down with the framerate or else the game will become harder if not unplayable at lower fps.
I am trying to build an IOS application that counts claps. I have been watching the WWDC videos on CoreAudio, and the topic seems so vast that I'm not quite sure where to look.
I have found similar problems here in stackoverflow. Here is one in C# for detecting a door slam:
Given an audio stream, find when a door slams (sound pressure level calculation?)
It seems that I need to do this:
Divide the samples up into sections
Calculate the energy of each section
Take the ratio of the energies between the previous window and the current window
If the ratio exceeds some threshold, determine that there was a sudden loud noise.
I am not sure how to accomplish this in Objective-C.
I have been able to figure out how to sample the audio with SCListener
Here is my attempt:
- (void)levelTimerCallback:(NSTimer *)timer {
[recorder updateMeters];
const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;
if ([recorder peakPowerForChannel:0] == 0)
totalClapsLabel.text = [NSString stringWithFormat:#"%d", total++];
SCListener *listener = [SCListener sharedListener];
if (![listener isListening])
return;
AudioQueueLevelMeterState *levels = [listener levels];
Float32 peak = levels[0].mPeakPower;
Float32 average = levels[0].mAveragePower;
lowPassResultsLabel.text = [NSString stringWithFormat:#"%f", lowPassResults];
peakInputLabel.text = [NSString stringWithFormat:#"%f", peak];
averageInputLabel.text = [NSString stringWithFormat:#"%f", average];
}
Though I see the suggested algorithm, I am unclear as to how to implement it in Objective-C.
You didn't mention what sort of detection fidelity you are looking for? Just checking for some kind of sound "pressure" change may be entirely adequate for your needs, honestly.
Keep in mind however that bumps to the phone might end up being a very low frequency and fairly high-powered impulse such that it will trigger you detector even though it was not an actual clap. Ditto for very high frequency sound sources that are also not likely to be a clap.
Is this ok for your needs?
If not and you are hoping for something higher fidelity, I think you'd be better of doing a spectral analysis (FFT) of the input signal and then looking in a much narrower frequency band for a sharp signal spike, similar to the part you already have.
I haven't looked closely at this source, but here's some possible open source FFT code you could hopefully use as-is for your iphone app:
Edit:
https://github.com/alexbw/iPhoneFFT
The nice part about graphing the spectral result is that it should make it quite easy to tune which frequency range you actually care about. In my own tests with some laptop software I have, my claps have a very strong spike around 1kHz - 2kHz.
Possibly overkill for you needs, but if you need something higher fidelity, then I suspect you will not be satisfied with simply tracking a signal spike without knowing what frequency range led to the signal spike in the first place.
Cheers
I used FFT for my App https://itunes.apple.com/us/app/clapmera/id519363613?mt=8 . Clap in the frequency domain looks like a (not perfect) constant.
Regards
I've tried the typical physics equations for this but none of them really work because the equations deal with constant acceleration and mine will need to change to work correctly. Basically I have a car that can be going at a large range of speeds and needs to slow down and stop over a given distance and time as it reaches the end of its path.
So, I have:
V0, or the current speed
Vf, or the speed I want to reach (typically 0)
t, or the amount of time I want to take to reach the end of my path
d, or the distance I want to go as I change from V0 to Vf
I want to calculate
a, or the acceleration needed to go from V0 to Vf
The reason this becomes a programming-specific question is because a needs to be recalculated every single timestep as the car keeps stopping. So, V0 constantly is changed to be V0 from last timestep plus the a that was calculated last timestep. So essentially it will start stopping slowly then will eventually stop more abruptly, sort of like a car in real life.
EDITS:
All right, thanks for the great responses. A lot of what I needed was just some help thinking about this. Let me be more specific now that I've got some more ideas from you all:
I have a car c that is 64 pixels from its destination, so d=64. It is driving at 2 pixels per timestep, where a timestep is 1/60 of a second. I want to find the acceleration a that will bring it to a speed of 0.2 pixels per timestep by the time it has traveled d.
d = 64 //distance
V0 = 2 //initial velocity (in ppt)
Vf = 0.2 //final velocity (in ppt)
Also because this happens in a game loop, a variable delta is passed through to each action, which is the multiple of 1/60s that the last timestep took. In other words, if it took 1/60s, then delta is 1.0, if it took 1/30s, then delta is 0.5. Before acceleration is actually applied, it is multiplied by this delta value. Similarly, before the car moves again its velocity is multiplied by the delta value. This is pretty standard stuff, but it might be what is causing problems with my calculations.
Linear acceleration a for a distance d going from a starting speed Vi to a final speed Vf:
a = (Vf*Vf - Vi*Vi)/(2 * d)
EDIT:
After your edit, let me try and gauge what you need...
If you take this formula and insert your numbers, you get a constant acceleration of -0,0309375. Now, let's keep calling this result 'a'.
What you need between timestamps (frames?) is not actually the acceleration, but new location of the vehicle, right? So you use the following formula:
Sd = Vi * t + 0.5 * t * t * a
where Sd is the current distance from the start position at current frame/moment/sum_of_deltas, Vi is the starting speed, and t is the time since the start.
With this, your decceleration is constant, but even if it is linear, your speed will accomodate to your constraints.
If you want a non-linear decceleration, you could find some non-linear interpolation method, and interpolate not acceleration, but simply position between two points.
location = non_linear_function(time);
The four constraints you give are one too many for a linear system (one with constant acceleration), where any three of the variables would suffice to compute the acceleration and thereby determine the fourth variables. However, the system is way under-specified for a completely general nonlinear system -- there may be uncountably infinite ways to change acceleration over time while satisfying all the constraints as given. Can you perhaps specify better along what kind of curve acceleration should change over time?
Using 0 index to mean "at the start", 1 to mean "at the end", and D for Delta to mean "variation", given a linearly changing acceleration
a(t) = a0 + t * (a1-a0)/Dt
where a0 and a1 are the two parameters we want to compute to satisfy all the various constraints, I compute (if there's been no misstep, as I did it all by hand):
DV = Dt * (a0+a1)/2
Ds = Dt * (V0 + ((a1-a0)/6 + a0/2) * Dt)
Given DV, Dt and Ds are all given, this leaves 2 linear equations in the unknowns a0 and a1 so you can solve for these (but I'm leaving things in this form to make it easier to double check on my derivations!!!).
If you're applying the proper formulas at every step to compute changes in space and velocity, it should make no difference whether you compute a0 and a1 once and for all or recompute them at every step based on the remaining Dt, Ds and DV.
If you're trying to simulate a time-dependent acceleration in your equations, it just means that you should assume that. You have to integrate F = ma along with the acceleration equations, that's all. If acceleration isn't constant, you just have to solve a system of equations instead of just one.
So now it's really three vector equations that you have to integrate simultaneously: one for each component of displacement, velocity, and acceleration, or nine equations in total. The force as a function of time will be an input for your problem.
If you're assuming 1D motion you're down to three simultaneous equations. The ones for velocity and displacement are both pretty easy.
In real life, a car's stopping ability depends on the pressure on the brake pedal, any engine braking that's going on, surface conditions, and such: also, there's that "grab" at the end when the car really stops. Modeling that is complicated, and you're unlikely to find good answers on a programming website. Find some automotive engineers.
Aside from that, I don't know what you're asking for. Are you trying to determine a braking schedule? As in there's a certain amount of deceleration while coasting, and then applying the brake? In real driving, the time is not usually considered in these maneuvers, but rather the distance.
As far as I can tell, your problem is that you aren't asking for anything specific, which suggests that you really haven't figured out what you actually want. If you'd provide a sample use for this, we could probably help you. As it is, you've provided the bare bones of a problem that is either overdetermined or way underconstrained, and there's really nothing we can do with that.
if you need to go from 10m/s to 0m/s in 1m with linear acceleration you need 2 equations.
first find the time (t) it takes to stop.
v0 = initial velocity
vf = final velocity
x0 = initial displacement
xf = final displacement
a = constant linear acceleration
(xf-x0)=.5*(v0-vf)*t
t=2*(xf-x0)/(v0-vf)
t=2*(1m-0m)/(10m/s-0m/s)
t=.2seconds
next to calculate the linear acceleration between x0 & xf
(xf-x0)=(v0-vf)*t+.5*a*t^2
(1m-0m)=(10m/s-0m/s)*(.2s)+.5*a*((.2s)^2)
1m=(10m/s)*(.2s)+.5*a*(.04s^2)
1m=2m+a*(.02s^2)
-1m=a*(.02s^2)
a=-1m/(.02s^2)
a=-50m/s^2
in terms of gravity (g's)
a=(-50m/s^2)/(9.8m/s^2)
a=5.1g over the .2 seconds from 0m to 10m
Problem is either overconstrained or underconstrained (a is not constant? is there a maximum a?) or ambiguous.
Simplest formula would be a=(Vf-V0)/t
Edit: if time is not constrained, and distance s is constrained, and acceleration is constant, then the relevant formulae are s = (Vf+V0)/2 * t, t=(Vf-V0)/a which simplifies to a = (Vf2 - V02) / (2s).