Why the AudioUnit collect 940bytes per frame under the earphone with lightning plug - objective-c

I am a iOS developer from China, and I developed a recording application based on AudioUnit. When I tested it on my iPhone6s using the 3.5 mm plug earphone,it worded well and it collected 1024 bytes per frame. But when I tested it on those iPhone which don't have 3.5mm plug, the AudioUnit collected 940 bytes per frame and it reported error.
I tried to test my app on my iPhone 6s using the lightning plug earphone, and it also worked well.

The Remote iOS Audio Unit can change the size of the data in the callback from the requested duration, depending on OS state and audio hardware used.
The difference between 1024 and 940 samples in the callback buffer usually means the Audio Unit is resampling the data from a sample rate of 48000 to a sample rate of 44100 sps. That can happen because the hardware sample rate is different on different iPhone models (48000 on newer ones, 44100 on early models), and your app might be requesting a sample rate different from the device hardware sample rate.

Related

How to increase WebRTC AECM Filter length to cater longer echoes?

I have been working on echo cancellation on cross-platform smartphones (iOS & Android) clusters in a room. Since the phones in the cluster don't have fixed orientation with respect to the common form which is usually a circular or linear array of phones. The phones are placed randomly in the room.
All phones play the audio in synchronization with accuracy up to 10ms. But the problem is there is an echo (some kind of screeching sound) that doesn't get canceled. I have seen aec dump files and it seems that echoes are longer than the typical window of delay which the webrtc aecm caters with a default filter length of 128ms i.e.
For Android:
low latency phones : 50ms -> [50 ms to 170 ms]
High latency phones: 150ms ->[150ms to 270ms]
For iOS:
Recording latency: 50ms
Playing latency: 50ms
I have rectified webrtc code to use HW AEC in combination with WebRTC AECM but the problem is the echoes are outside these windows.
So how do I increase the filter length?

WebRTC: Bad Encoding Performance for Screensharing via CGDisplayStream (h264/vp8/vp9)

I am using the objective-c framework for WebRTC for building a screensharing app. The video is captured using CGDisplayStream. I have a working demo but at 2580x1080 I get only 3-4 fps. My googAvgEncodeMs is around 100-300ms (should be >10ms ideally) which explains why the screensharing is far from being fluid (30fsp+). I also switched between codecs (h264/vp8/vp9) but with all of them I get the same slow experience. The contentType in webRTC is set to screen (values: [screen,realtime]).
The cpu usage of my mac is then between 80-100%. My guess is that there is some major optimisation (qpMax, hardware-acceleration etc...) in the c++ code of the codecs that I have missed. Unfortunately my knowledge on codecs is limited.
Also interesting: Even when I lower the resolution to 320x240 the googAvgEncodeMs is still in the range of 30-60ms.
I am running this on a MacBook Pro 15 inch from 2018. When running a random webrtc inside Chrome/Firefox etc I get smoother results than with the vanilla webrtc framework.
WebRTC uses software encoding and that is the real culprit. Also encoding 2580 x 1080 in software is not going to be practical. Try reducing H and V resolution in half and it will improve performance with some loss in quality. Also if you are doing screen sharing and video support is not critical, you can drop frame rate to 10 frames per second. Logical solution is to figure out how to incorporate h/w acceleration.

USB performance issues

I am trying to transfer data over USB. These are packets of at most 64 bytes which get sent at a frequency of 4KHz. This gives a bit rate of about 2Mb/s.
The software task that picks up this data runs at 2.5 KHz.
Ideally we never want packets to get there slower than 2.5 KHz (so 2 KHz isn't very good).
Is anyone aware of any generic limits on what USB can achieve?
We are running on a main board which has a 1.33 GHz running QNX and a daughter board which is a TWR K60F120M tower system running MQX.
Apart from the details of the system, is USB supposed to be used in this kind of data transfers, i.e., high frequency and short packet sizes?
Many Thanks for your help
MG
USB, even at its slowest spec (1.1), can transfer data at up to 12MB/sec, provided you use the proper transfer mode. USB will process 1000 "frames" per second. The frames contain control and data information, and various portions of each frame are used for various purposes, and thus the total information content is "multiplexed" amongst these competing requirements.
Low speed devices will use just a few bytes in a frame to send or receive their data. Examples are modems, mice, keyboards, etc. The so-called Full Speed devices (in USB 1.1) can achieve up to 12 MB/sec by using isochronous mode transfers, meaning they get carved out a nice big chunk of each frame, and can send that much data (a fixed size) each time a frame comes along. This is the mode audio devices use to stream the relatively data-intensive music to USB speakers, for example.
If you are able to do a little bit of local buffering, you may be able to use isochronous mode to get your 64 bytes of data sent at 1 KHz, but with 2 or 3 periods (at 2.5KHz) worth of data in the USB frame transfer. You'd want to reserve 64 x 3 = 192 bytes of data (plus maybe a few extra bytes for control info, such as how many chunks are present: 2 or 3?). Then, as the USB frames come by, you'd put your 2 chunks or 3 chunks of data onto the wire, and the receiving end would then get that data, albeit in a more bursty way than just smoothly at a precise 2.5KHz rate. However, this way of transferring the data would more than keep up, even with USB 1.1, and still only use a fraction of the total available USB bandwidth.
The problem, as I see it, is whether your system design can tolerate a data delivery rate that is "bursty"... in other words, instead of getting 64 bytes at a rate of 2.5KHz, you'll be getting (on average) 160 bytes at a 1 KHz rate. You'll actually get something like this:
So, I think with USB that will be the best you can do -- get a somewhat bursty delivery of either 2 or 3 of your device's data packet per 1 mSec USB frame rep rate.
I am not an expert in USB, but I have done some work with it, including debugging a device-to-host tunneling protocol which used USB "interrupts", so I have seen this kind of implementation on other systems, to solve the problem of matching the USB frame rate to the device's data rate.

audio-unit sample rate and buffer size

i am facing a really misunderstanding when sampling the iphone audio with remoteIO.
from one side, i can do this math: 44khz sample rate means 44 samples per 1ms. which means if i set bufferSize to 0.005 with :
float bufferLength = 0.00005;
AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration, sizeof(bufferLength), &bufferLength);
which means 5ms buffer size -which means 44*5=220 samples in buffer each callback.
BUT i get 512 samples from inNumberFrames each callback . and it stay's fixed even when i change buffer length.
another thing , my callbacks are every 11ms and is not changing! i need faster callbacks .
so !
what is going on here ?
who set what ?
i need to pass a digital information in an FSK modulation, and have to know exactly buffer size in samples, and what time from the signal it has , in order to know how to FFT it correctly .
any explanation on this ?
thanks a lot.
There is no way on all current iOS 10 devices to get RemoteIO audio recording buffer callbacks at a faster rate than every 5 to 6 milliseconds. The OS may even decide to switch to sending even larger buffers at a lower callback rate at runtime. The rate you request is merely a request, the OS then decides on the actual rates that are possible for the hardware, device driver, and device state. This rate may or may not stay fixed, so your app will just have to deal with different buffer sizes and rates.
One of your options might be to concatenate each callback buffer onto your own buffer, and chop up this second buffer however you like outside the audio callback. But this won't reduce actual latency.
Added: some newer iOS devices allow returning audio unit buffers that are shorter than 5.x mS in duration, usually a power of 2 in size at a 48000 sample rate.
i need to pass a digital information in an FSK modulation, and have to know exactly buffer size in samples, and what time from the signal it has , in order to know how to FFT it correctly.
It doesn't work that way - you don't mandate various hosts or hardware to operate in an exact manner which is optimal for your processing. You can request reduced latency - to a point. Audio systems generally pass streaming pcm data in blocks of samples sized by a power of two for efficient realtime io.
You would create your own buffer for your processor, and report latency (where applicable). You can attempt to reduce wall latency by choosing another sample rate, or by using a smaller N.
The audio session property is a suggested value. You can put in a really tiny number but will just go to the lowest value it can. The fastest that I have seen on an iOS device when using 16 bit stereo was 0.002902 second ( ~3ms ).
That is 128 samples (LR stereo frames) per callback. Thus, 512 bytes per callback.
So 128/44100 = 0.002902 seconds.
You can check it with:
AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareIOBufferDuration, &size, &bufferDuration)
Could the value 512 in the original post have meant bytes instead of samples?

Recognition of a short high frequency sound in low frequency noise (objc/c)

I am currently creating an application which signals readiness to other devices using a high frequency sound.
(transmitter): A device will produce a short burst of sound of around 20khz.
(receiver): Another device will be listening for a sound at this frequency at a small distance from the transmitter(10m approx) The device recieves audio data from a microphone
The background noise will be fairly loud, varying from around 0 - 10khz(about human speech range), and would be produced by a small crowd of people.
I need the receiving device to be able to detect the 20khz sound, separated from the noise,
and know the time at which it was received.
Any help with an appropriate algorithm, a library, or even better, code in C or
Objc to detect this high frequency sound would be greatly appreciated.
20 kHz may be pushing it, as (a) most sound cards have low pass (anti aliassing) filters at 18 - 20 kHz and (b) most speakers and microphones tend to have a poor response at 20 kHz. You might want to consider say 15 kHz ?
The actual detection part should be easy - just implement a narrow band pass filter at the tone frequency, rectify the output and low pass filter (e.g. 10 Hz).
You may want to look into FFT (Fast Fourier Transform). This algorithm will allow you to analyse a waveform and convert it to the frequency spectrum for further analysis.
If this is for Mac OS or iOS, I'd start looking into Core Audio's Audio Units.
1 Here's Apple's Core Audio Overview.
2 Some AudioUnits for Mac OS
3 Or for iOS AudioUnit Hosting
A sound with that high frequency will not travel at all with the iphone speaker.