What is a better way in iOS to detect sound level? [duplicate] - objective-c

I am trying to build an IOS application that counts claps. I have been watching the WWDC videos on CoreAudio, and the topic seems so vast that I'm not quite sure where to look.
I have found similar problems here in stackoverflow. Here is one in C# for detecting a door slam:
Given an audio stream, find when a door slams (sound pressure level calculation?)
It seems that I need to do this:
Divide the samples up into sections
Calculate the energy of each section
Take the ratio of the energies between the previous window and the current window
If the ratio exceeds some threshold, determine that there was a sudden loud noise.
I am not sure how to accomplish this in Objective-C.
I have been able to figure out how to sample the audio with SCListener
Here is my attempt:
- (void)levelTimerCallback:(NSTimer *)timer {
[recorder updateMeters];
const double ALPHA = 0.05;
double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * lowPassResults;
if ([recorder peakPowerForChannel:0] == 0)
totalClapsLabel.text = [NSString stringWithFormat:#"%d", total++];
SCListener *listener = [SCListener sharedListener];
if (![listener isListening])
return;
AudioQueueLevelMeterState *levels = [listener levels];
Float32 peak = levels[0].mPeakPower;
Float32 average = levels[0].mAveragePower;
lowPassResultsLabel.text = [NSString stringWithFormat:#"%f", lowPassResults];
peakInputLabel.text = [NSString stringWithFormat:#"%f", peak];
averageInputLabel.text = [NSString stringWithFormat:#"%f", average];
}
Though I see the suggested algorithm, I am unclear as to how to implement it in Objective-C.

You didn't mention what sort of detection fidelity you are looking for? Just checking for some kind of sound "pressure" change may be entirely adequate for your needs, honestly.
Keep in mind however that bumps to the phone might end up being a very low frequency and fairly high-powered impulse such that it will trigger you detector even though it was not an actual clap. Ditto for very high frequency sound sources that are also not likely to be a clap.
Is this ok for your needs?
If not and you are hoping for something higher fidelity, I think you'd be better of doing a spectral analysis (FFT) of the input signal and then looking in a much narrower frequency band for a sharp signal spike, similar to the part you already have.
I haven't looked closely at this source, but here's some possible open source FFT code you could hopefully use as-is for your iphone app:
Edit:
https://github.com/alexbw/iPhoneFFT
The nice part about graphing the spectral result is that it should make it quite easy to tune which frequency range you actually care about. In my own tests with some laptop software I have, my claps have a very strong spike around 1kHz - 2kHz.
Possibly overkill for you needs, but if you need something higher fidelity, then I suspect you will not be satisfied with simply tracking a signal spike without knowing what frequency range led to the signal spike in the first place.
Cheers

I used FFT for my App https://itunes.apple.com/us/app/clapmera/id519363613?mt=8 . Clap in the frequency domain looks like a (not perfect) constant.
Regards

Related

Getting frequency and amplitude from an audio file using FFT - so close but missing some vital insights, eli5?

tl/dr: I've got two audio recordings of the same song without timestamps, and I'd like to align them. I believe FFT is the way to go, but while I've got a long way, it feels like I'm right on the edge of understanding enough to make it work, and would greatly benefit from a "you got this part wrong" advice on FFT. (My education never got into this area) So I came here seeking ELI5 help.
The journey:
Get two recordings at the same sample rate. (done!)
Transform them into a waveform. (DoubleArray) This doesn't keep any of the meta info like "samples/second" but the FFT math doesn't care until later.
Run a FFT on them using a simplified implementation for beginners
Get a Array<Frame>, each Frame contains Array<Bin>, each Bin has (amplitude, frequency) because the older implementation hid all the details (like frame width, and number of Bins, and ... stuff?) and outputs words I'm familiar with like "amplitude" and "frequency"
Try moving to a more robust FFT (Apache Commons)
Get an output of 'real' and 'imaginary' (uh oh)
Make the totally incorrect assumption that those were the same thing (amplitude and frequency). Surprise, they aren't!
Apache's FFT returns an Array<Complex> which means it... er... is just one frame's worth? And I should be chopping the song into 1 second chunks and passing each one into the FFT and call it multiple times? That seems strange, how does it get lower frequencies?
To the best of my understanding, the complex number is a way to convey the phase shift and amplitude in one neat container (and you need phase shift if you want to do the FFT in reverse). And the frequency is calculated from the index of the array.
Which works out to (pseudocode in Kotlin)
val audioFile = File("Dream_On.pcm")
val (phases, amplitudes) = AudioInputStream(
audioFile.inputStream(),
AudioFormat(
/* encoding = */ AudioFormat.Encoding.PCM_SIGNED,
/* sampleRate = */ 44100f,
/* sampleSizeInBits = */ 16,
/* channels = */ 2,
/* frameSize = */ 4,
/* frameRate = */ 44100f,
/* bigEndian = */ false
),
(audiFile.length() / /* frameSize */ 4)
).use { ais ->
val bytes = ais.readAllBytes()
val shorts = ShortArray(bytes.size / 2)
ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts)
val allWaveform = DoubleArray(shorts.size)
for (i in shorts.indices) {
allWaveform[i] = shorts[i].toDouble()
}
val halfwayThroughSong = allWaveform.size / 2
val moreThanOneSecond = allWaveform.copyOfRange(halfwayThroughSong, halfwayThroughSong + findNextPowerOf2(44100))
val fft = FastFourierTransformer(DftNormalization.STANDARD)
val fftResult: Array<Complex> = fft.transform(moreThanOneSecond, TransformType.FORWARD)
println("fftResult size: ${fftResult.size}")
val phases = DoubleArray(fftResult.size / 2)
val amplitudes = DoubleArray(fftResult.size / 2)
val frequencies = DoubleArray(fftResult.size / 2)
fftResult.filterIndexed { index, _ -> index < fftResult.size / 2 }.forEachIndexed { idx, complex ->
phases[idx] = atan2(complex.imaginary, complex.real)
frequencies[idx] = idx * 44100.0 / fftResult.size
amplitudes[idx] = hypot(complex.real, complex.imaginary)
}
Triple(phases, frequencies, amplitudes)
}
Is my step #8 at all close to the truth? Why would the FFT result return an array as big as my input number of samples? That makes me think I've got the "window" or "frame" part wrong.
I read up on
FFT real/imaginary/abs parts interpretation
Converting Real and Imaginary FFT output to Frequency and Amplitude
Java - Finding frequency and amplitude of audio signal using FFT
An audio recording in waveform is a series of sound energy levels, basically how much sound energy there should be at any one instant. Based on the bit rate, you can think of the whole recording as a graph of energy versus time.
Sound is made of waves, which have frequencies and amplitudes. Unless your recording is of a pure sine wave, it will have many different waves of sound coming and going, which summed together create the total sound that you experience over time. At any one instant of time, you have energy from many different waves added together. Some of those waves may be at their peaks, and some at their valleys, or anywhere in between.
An FFT is a way to convert energy-vs.-time data into amplitude-vs.-frequency data. The input to an FFT is a block of waveform. You can't just give it a single energy level from a one-dimensional point in time, because then there is no way to determine all the waves that add together to make up the amplitude at that point of time. So, you give it a series of amplitudes over some finite period of time.
The FFT then does its math and returns a range of complex numbers that represent the waves of sound over that chunk of time, that when added together would create the series of energy levels over that block of time. That's why the return value is an array. It represents a bunch of frequency ranges. Together the total data of the array represents the same energy from the input array.
You can calculate from the complex numbers both phase shift and amplitude for each frequency range represented in the return array.
Ultimately, I don’t see why performing an FFT would get you any closer to syncing your recordings. Admittedly it’s not a task I’ve tried before. But I would think waveform data is already the perfect form for comparing the data and finding matching patterns. If you break your songs up into chunks to perform FFTs on, then you can try to find matching FFTs but they will only match perfectly if your chunks are divided exactly along the same division points relative to the beginning of the original recording. And even if you could guarantee that and found matching FFT’s, you will only have as much precision as the size of your chunks.
But when I think of apps like Shazam, I realize they must be doing some sort of manipulation of the audio that breaks it down into something simpler for rapid comparison. That possibly involves some FFT manipulation and filtering.
Maybe you could compare FFTs using some algorithm to just find ones that are pretty similar to narrow down to a time range and then compare wave form data in that range to find the exact point of synchronization.
I would imagine the approach that would work well would to find the offset with the maximum cross-correlation between the two recordings. This means calculate the cross-correlation between the two pieces at various offsets. You would expect the maximum cross-correlation to occur at the offset where the two piece were best aligned.

Calculating walking distance for user over time

I'm trying to track the distance a user has moved over time in my application using the GPS. I have the basic idea in place, so I store the previous location and when a new GPS location is sent I calculate the distance between them, and add that to the total distance. So far so good.
There are two big issues with this simple implementation:
Since the GPS is inacurate, when the user moves, the GPS points will not be a straight line but more of a "zig zag" pattern making it look like the user has moved longer than he actually have moved.
Also a accuracy problem. If the phone just lays on the table and polls GPS possitions, the answer is usually a couple of meters different every time, so you see the meters start accumulating even when the phone is laying still.
Both of these makes the tracking useless of coruse, since the number I'm providing is nowwhere near accurate enough.
But I guess that this problem is solvable since there are a lot of fitness trackers and similar out there that does track distance from GPS. I guess they do some kind of interpolation between the GPS values or something like that? I guess that won't be 100% accurate either, but probably good enough for my usage.
So what I'm after is basically a algorithm where I can put in my GPS positions, and get as good approximation of distance travelled as possible.
Note that I cannot presume that the user will follow roads, so I cannot use the Google Distance Matrix API or similar for this.
This is a common problem with the position data that is produced by GPS receivers. A typical consumer grade receiver that I have used has a position accuracy defined as a CEP of 2.5 metres. This means that for a stationary receiver in a "perfect" sky view environment over time 50% of the position fixes will lie within a circle with a radius of 2.5 metres. If you look at the position that the receiver reports it appears to wander at random around the true position sometimes moving a number of metres away from its true location. If you simply integrate the distance moved between samples then you will get a very large apparent distance travelled.for a stationary device.
A simple algorithm that I have used quite successfully for a vehicle odometer function is as follows
for(;;)
{
Stored_Position = Current_Position ;
do
{
Distance_Moved = Distance_Between( Current_Position, Stored_Position ) ;
} while ( Distance_Moved < MOVEMENT_THRESHOLD ) ;
Cumulative_Distance += Distance_Moved ;
}
The value of MOVEMENT_THRESHOLD will have an effect on the accuracy of the final result. If the value is too small then some of the random wandering performed by the stationary receiver will be included in the final result. If the value is too large then the path taken will be approximated to a series of straight lines each of which is as long as the threshold value. The extra distance travelled by the receiver as its path deviates from this straight line segment will be missed.
The accuracy of this approach, when compared with the vehicle odometer, was pretty good. How well it works with a pedestrian would have to be tested. The problem with people is that they can make much sharper turns than a vehicle resulting in larger errors from the straight line approximation. There is also the perennial problem with sky view obscuration and signal multipath caused by buildings, vehicles etc. that can induce positional errors of 10s of metres.

SuperCollider: automatic phase and frequency alignment of oscillators

Anyone has an idea for automatic phase and frequency alignment?
To explain: assume, you have an Impulse
in = Impulse.ar(Rand(2, 5), Rand(0, 1));
now I'd like to manipulate the frequency of another Impulse such that it adapts its phase and frequency to match the input.
Any suggestions, even for a google search are highly appreciated.
[question asked on behalf of a colleague]
I don't agree that this, as frame is a tough problem. Impulses are simple to track - that's why, for example, old rotary dial phones used pulse trains.
Here's some code that generates an impulse at a random frequency, then resynthesises another impulse at the same frequency. It also outputs a pitch estimate.
(
var left, right, master, slave, periodestimatebus, secretfrequency;
s = Server.default;
left = Bus.new(\audio, 0,1);
right = Bus.new(\audio, 1,1);
periodestimatebus = Bus.control(s,1);
//choose our secret frequency here for later comparison:
secretfrequency = rrand(2.0,5.0);
//generate impulse with secret frequency at some arbitrary phase
master = {Impulse.ar(secretfrequency, Rand(0, 1));}.play(s, left);
slave = {
var masterin, clockcount, clockoffset, syncedclock, periodestimate, tracking;
masterin = In.ar(left);
//This 1 Hz LFSaw is the "clock" against which we measure stuff
clockcount = LFSaw.ar(1, 0, 0.5, 0.5);
clockoffset = Latch.ar(clockcount, Delay1.ar(masterin));
syncedclock = (clockcount - clockoffset).frac;
//syncedclock is a version of the clock hard-reset (one sample after) every impulse trigger
periodestimate = Latch.ar(syncedclock, masterin);
//sanity-check our f impulse
Out.kr(periodestimatebus, periodestimate);
//there is no phase estimate per se - what would we measure it against? -
//but we can resynthesise a new impulse up to a 1 sample delay from the matched clock.
tracking = (Slope.ar(syncedclock)>0);
}.play(master, right, 0, addAction: \addAfter);
//Let's see how we performed
{
periodestimatebus.get({|periodestimate|
["actual/estimated frequency", secretfrequency, periodestimate.reciprocal].postln;
});
}.defer(1);
)
Notes to this code:
The periodestimate is generated by tricksy use of Delay1 to make sure that it samples the value of the clock just before it is reset. As such it is off by one sample.
The current implementation will produce a good period estimate with varying frequencies, down to 1Hz at least. Any lower and you'd need to change the clockcount clock to have a different frequency and tweak the arithmetic.
Many improvements are possible. For example, if you wish to track varying frequencies you might want to tweak it a little bit so that the resynthesized signal does not click too often as it underestimates the signal.
This is a tough problem, as you are dealing with a noisy source with a low frequency. If this were a sine wave I'd recommend a FFT, but FFTs don't do very well with noisy sources and low frequencies. It's still worth a shot. FFTs can match phase too. I believe you can use pitch.ar to help find the frequency.
The Chrip-Z algorithm is something you could use instead of the FFT -http://www.embedded.com/design/configurable-systems/4006427/A-DSP-algorithm-for-frequency-analysis
http://en.wikipedia.org/wiki/Bluestein%27s_FFT_algorithm
Another thing you could try is to use a neural net to try and guess it's way to the right information. You could use active training to help it achieve this goal. There is a very general discussion of this on SO:
Pitch detection using neural networks
One method some folks are coming around to is simulating the neurons of Cochlea to detect pitch.
You may want to read up on Phase-locked loops: http://en.wikipedia.org/wiki/Phase-locked_loop

Variable time step bug with Box2D

Can anybody spot what is wrong with the code below. It is supposed to average the frame interval (dt) for the previous TIME_STEPS number of frames.
I'm using Box2d and cocos2d, although I don't think the cocos2d bit is very relevent.
-(void) update: (ccTime) dt
{
float32 timeStep;
const int32 velocityIterations = 8;
const int32 positionIterations = 3;
// Average the previous TIME_STEPS time steps
for (int i = 0; i < TIME_STEPS; i++)
{
timeStep += previous_time_steps[i];
}
timeStep = timeStep/TIME_STEPS;
// step the world
[GB2Engine sharedInstance].world->Step(timeStep, velocityIterations, positionIterations);
for (int i = 0; i < TIME_STEPS - 1; i++)
{
previous_time_steps[i] = previous_time_steps[i+1];
}
previous_time_steps[TIME_STEPS - 1] = dt;
}
The previous_time_steps array is initially filled with whatever the animation interval is set too.
This doesn't do what I would expect it too. On devices with a low frame rate it speeds up the simulation and on devices with a high frame rate it slows it down. I'm sure it's something stupid I'm over looking.
I know box2D likes to work with fixed times steps but I really don't have a choice. My game runs at a very variable frame rate on the various devices so a fixed time stop just won't work. The game runs at an average of 40 fps, but on some of the crappier devices like the first gen iPad it runs at barely 30 frames per second. The third gen ipad runs it at 50/60 frames per second.
I'm open to suggestion on other ways of dealing with this problem too. Any advice would be appreciated.
Something else unusual I should note that somebody might have some insight into is the fact that running any debug optimisations on the build has a huge effect on the above. The frame rate isn't changed much when debug optimisations are set to -Os vs -O0. But when the debut optimisations are set to -Os the physics simulation runs much faster than -O0 when the above code is active. If I just use dt as the interval instead of the above code then the debug optimisations make no difference.
I'm totally confused by that.
On devices with a low frame rate it speeds up the simulation and on
devices with a high frame rate it slows it down.
That's what using a variable time step is all about. If you only get 10 fps the physics engine will iterate the world faster because the delta time is larger.
PS: If you do any kind of performance tests like these, run them with the release build. That also ensures that (most) logging is disabled and code optimizations are on. It's possible that you simply experience much greater impact on performance from debugging code on older devices.
Also, what value is TIME_STEPS? It shouldn't be more than 10, maybe 20 at most. The alternative to averaging is to use delta time directly, but if delta time is greater than a certain threshold (30 fps) switch to using a fixed delta time (cap it). Because variable time step below 30 fps can get really ugly, it's probably better in such cases to allow the physics engine to slow down with the framerate or else the game will become harder if not unplayable at lower fps.

iphone - distanceFromLocation accuracy

I found this issue with distanceFromLocation function where the return value is not accurate. Please confirm if I did it wrongly or the function is buggy.
CLLocation *locA = [[CLLocation alloc] initWithLatitude:5.321008 longitude:100.290131];
CLLocation *locB = [[CLLocation alloc] initWithLatitude:5.321008 longitude:100.290138];
CLLocationDistance distance = [locA distanceFromLocation:locB];
NSLog(#"distance: %f, locA: %f,%f, locB: %f,%f",distance,locA.coordinate.latitude,locA.coordinate.longitude,locB.coordinate.latitude,locB.coordinate.longitude);
The output is:
distance: 0.775644, locA: 5.321008,100.290131, locB: 5.321008,100.290138
Both locations are near and should be less than 10 meters. However, the function return larger distance. Checked with site such as http://jan.ucc.nau.edu/~cvm/latlongdist.html and the distance should be:
Distance between 5.321008N 100.290131E and 5.321008N 100.290138E is 0.0008 km
Your answer is in meters if you convert it to kilemeters you get 0.000775 km which is roughly the same. see reference documentation from apple for more.
This is a GPS issue with iPhones, not a coding problem.
Since iphones are meant primarily to do other things, the maker does not put in heavy duty GPS chips - like Sirf Star III, for example. This would kill battery lifetime. Instead the phone relies partially on another protocol mandated for cell phones since 9/11, which calculates distance from cell towers and does trilateration to get an approximate value. Trilateration works in conjuction with the the weaker GPS chipset, if the chipset can find 4 satellites. The result is that iphones often are of by more than 40 feet when using the GPS function. BTW normal hand-held GPS units like Garmin (not surveryors differential GPS units) often are off by as much as 2-3 meters.
Look up differential GPS on Wikipedia to get more on this.