Solving a producer-consumer problem with NSData (for audio streaming) - objective-c

I am using AVAssetReader to copy PCM data from an iPod track to a buffer, which is then played with a RemoteIO audio unit. I am trying to create a separate thread for loading sound data, so that I can access and play data from the buffer while it is being loaded into.
I currently have a large NSMutableData object that eventually holds the entire song's data. Currently, I load audio data in a separate thread using NSOperation like so:
AVAssetReaderOutput copies, at most, 8192 bytes at a time to a CMBlockBuffer
Copy these bytes to a NSData object
Append this NSData object to a larger NSMutableData object (which eventually holds the entire song)
When finished, play the song by accessing each packet in the NSMutableData object
I'm trying to be able to play the song WHILE copying these bytes. I am unsure what a good way to write to and read from a file from the same time is.
A short idea I had:
Create and fill 3 NSData objects, each 8192 bytes in length, as buffers.
Start playing. When I have finished playing the first buffer, load new data into the first buffer.
When I have finished playing the second buffer, load new data into the second. Same for the third
Start playing from the first buffer again, fill the third. And so on.
Or, create one NSData object that holds 3 * 8192 PCM units, and somehow write to and read from it at the same time with two different threads.
I have my code working on two different threads right now. I append data to the array until I press play, at which point it stops (probably because the thread is blocked, but I don't know right now) and plays until it reaches the end of whatever I loaded and causes an EXC_BAD_ACCESS exception.
In short, I want to find the right way to play PCM data while it is being copied, say, 8192 bytes at a time. I will probably have to do so with another thread (I am using NSOperation right now), but am unclear on how to write to and read from a buffer at the same time, preferably using some higher level Objective-C methods.

I'm doing this exact thing. You will definitely need to play your audio on a different thread (I am doing this with RemoteIO). You will also need to use a circular buffer. You probably want to look up this data structure if you aren't familiar with it as you will be using it a lot for this type of operation. My general setup is as follows:
LoadTrackThread starts up and starts loading data from AVAssetReader and storing it in a file as PCM.
LoadPCMThread starts up once enough data is loaded into my PCM file and essentially loads that file into local memory for my RemoteIO thread on demand. It does this by feeding this data into a circular buffer whenever my RemoteIO thread gets even remotely close to running out of samples.
RemoteIO playback callback thread consumes the circular buffer frames and feeds them to the RemoteIO interface. It also informs LoadPCMThread to wake up when it needs to start loading more samples.
This should be about all you need as far as threads. You will need to have some sort of mutex or semaphore between the two threads to ensure you aren't trying to read your file while you are writing into it at the same time (this is bad form and will cause you to crash). I just have both my threads set a boolean and sleep for a while until it is unset. There is probably a more sophisticated way of doing this but it works for my purposes.
Hope that helps!

Related

How do I know when Vulkan isn't using memory anymore so I can overwrite it / reuse it?

When working with Vulkan it's common that when creating a buffer, such as a uniform buffer, that you create multiple (buffers 'versions'), because if you have double buffering for example you don't know if the graphics API is still drawing the last frame (using the memory you bound and instructed it to use the last loop). I've seen this happen with uniform buffers but not vertex or index buffers or image/texture buffers. Is this because uniform buffers are updated regularly and vertex buffers or images are not?
If you wanted to update an image or a vertex buffer how would you go about it given that you don't know whether the graphics API is still using it? Do you simply reallocate new memory for that image/buffer and start anew? Even if you just want to update a section of it? And if this is the case that you allocate a new buffer, when would you know to release the old buffer? Would say, for example 5 frames into the future be OK? Or 2 seconds? After all, it could still be being used. How is this done?
given that you don't know whether the graphics API is still using it?
But you do know.
Vulkan doesn't arbitrarily use resources. It uses them exactly and only how your code tells it to use the resource. You created and submitted the commands that use those resources, so if you need to know when a resource is in use, it is you who must keep track of it and manage this.
You have to use API synchronization functions to follow the GPU's execution of commands.
If an action command uses some set of resources, then those resources are in use while that command is being executed. You have tools like events which can be used to stop subsequent commands from executing until some prior commands have finished. And events can tell when a particular command has finished, so that you'll know when those resources are no longer in use.
Semaphores have similar powers, but at the level of a batch of work. If a semaphore is signaled, then all of the commands in the batch that signaled it have completed and are no longer using the resources they use. Fences can be used for extremely coarse synchronization, at the level of a submit command.
You multi-buffer uniform data because the nature of uniform data is such that it typically needs to change every frame. If you have vertex buffers or images to change every frame, then you'll need to do the same thing with those.
For infrequent changes, you may want to have extra memory available so that you can just create new images or buffers, then delete the old ones when the memory is no longer in use. Or you may have to stall the CPU until the GPU has finished using those resources.

A rarely mentioned Vulkan function "vkCmdUpdateBuffer()", what is it used for?

This seems to be a simple Vulkan API question but I really can not find answer after search Internet.
I noticed there is a Vulkan function:
void vkCmdUpdateBuffer(
VkCommandBuffer commandBuffer,
VkBuffer dstBuffer,
VkDeviceSize dstOffset,
VkDeviceSize dataSize,
const void* pData);
At first glance, I thoughts it can be used to record the command buffer since it has prefix vkCmd in its name, but the document says that
vkCmdUpdateBuffer is only allowed outside of a render pass. This command is treated as “transfer” operation, for the purposes of synchronization barriers.
So I start thinking that it is a convenience function that wraps the buffer data transferring operation like using memcpy() to copy the data from host to the device.
Then my question is: Why there is NOT a single Vulkan sample / tutorial (I have searched all of them) using vkCmdUpdateBuffer() instead of manually coping data by memcpy(). Did I understand it wrong?
All vkCmd* functions generate commands into a command buffer. This one is no exception. It is a transfer command, and like most transfer commands, you don't get to do them within a render pass. But there are plenty of command buffer generating commands that don't work in render passes.
Normally Vulkan memory transfer operations only happen between device memory. The typical mechanism for the host to put something in device memory is to write to a mapped pointer. But by definition, that requires that the destination memory be mappable. So if you want to write something to non-mappable memory, you have to copy it to mappable memory, then do a transfer operation between the mappable memory to the non-mappable memory via vkCmdCopy* functions.
And that's fine if you're doing a bunch of transfers all at once. You can copy a bunch of stuff into mapped memory, then submit a batch containing all of the copy operations to copy the data into the appropriate locations.
But sometimes, you're just updating a small piece of device memory. If it's not mappable, then that's a lot of work to do just to get a few kilobytes of data to the GPU. In that case, vkCmdUpdateBuffer may be the better choice, since it can "directly" copy from CPU memory to any device memory.
I say "directly" because that's obviously not what it's doing. It's really doing the same thing you would have done, except it's doing it within the command buffer. You would have copied your CPU data into GPU mappable memory, then created a command that copies from that mappable memory into non-mappable memory.
vkCmdUpdateBuffer does the exact same thing. It copies the data from the pointer/size you give it into mappable memory (which is provided by the command buffer itself. This is why it has an upper limit of 64KB). This copy happens immediately, just as it would have if you did a memcpy, so when this function returns, you can do whatever you want with the pointer you gave it. Then it creates a command in the command buffer that copies from the mappable memory in the command buffer to the destination memory location.
The documentation for this function explicitly gives warnings about using it for larger transfers. That is, it tells you not to do that. This is for quick, small, one-shot updates of unmappable memory. Nothing more.
That's one reason why tutorials don't talk about it: it's a highly special-case function that many novice users will try to use because it's easier than the explicit code. But in most cases, they should not be using it.

WasapiLoopbackCapture to WaveOut

I'm using WasapiLoopbackCapture to capture sound coming from my speakers and then using onDataAvailable to send it to another device and I'm attempting to play the data sent using the WaveOut class and a BufferedWaveProvider and just adding a sample everytime data is sent from my client using the onDataAvailable. I'm having problems sending sound. The most functioning I've managed to get it is:
Not syncing the Wave format of the client and the server, just sending data and adding it to the sample. Problem is this is stutters very much even though I checked the buffer stored size and it has 51 seconds. I even have to increase the buffer size which eventually overflows anyway.
I tried syncing the Wave format and I just get clicks but have no problem with buffer size. I also tried making sure that at least a second was stored in the buffer but that had zero effect.
If anyone could point me in the right direction that would be great.
Uncompressed audio takes up a lot of space on a network. On my machine the WasapiLoopbackCapture object produces 32-bit (IeeeFloat) stereo samples at 44100 samples per second, for around 2.7Mbit/sec total raw bandwidth. Once you factor in TCP packet overheads and so on, that's quite a lot of data you're transferring.
The first thing I would suggest though is that you plug in some profiling code at each step in the process to get an idea of where your bottlenecks are happening. How fast is data arriving from the capture device? How big are your packets? How long does it take to service each call to your OnDataAvailable event handler? How much data are you sending per second across the network? How fast is the data arriving at the client? Figure out where the bottlenecks are and you get a much better idea of what the bottlenecks are.
Try building a simulated server that reads data from a wave file in various WaveFormats (channels, bits per sample and sample rate) and simulates sending that data across the network to the client. You might find that the problem goes away at lower bandwidth. And if bandwidth is the issue, compression might be the solution.
If you're using a single-threaded model, and servicing each OnDataAvailable event takes longer than the recording frequency (ie: number of expected calls to OnDataAvailable per second) then there's going to be a data loss issue. Multiple threads can help with this - one to get the data from the audio system, another to process and send the data. But you can end up in the same position: losing data because you're not dealing with it quickly enough. When that happens it's handy to know about it, because it indicates a problem in the program. Find out when and where it happens - overflow in input, processing or output buffers all have different potential reasons and need different attention.

Where are buffers located?

I hear a lot about flushing buffers, sending to buffer etc. but I don't have a visual image about where buffers reside and how they look like.
Are buffers part of the OS' kernel or part of each process? If the case is the first, can the same buffers be used by multiple processes?
A buffer is a generic term for a collection of bytes, typically used in the context of either sending, receiving or storing information where the internal data-structure of the information isn't important.
In the case of "flushing" buffers, this typically is used in the context of sending data either to a file or network; the buffer in this case being used to coalesce multiple small writes to the file or network into one larger and more-efficient-to-transmit buffer. After the final write has been performed (or after some "commit" point), the buffer must be "flushed" to ensure that any data left waiting to coalesce with a future write is committed immediately to the underlying file sent over the network rather than left waiting for a future write that might never come.
In both the case of network and file IO, buffers are usuaully used in multiple places. File IO may well be buffered by a buffer in the application, in a library (for instance an implementation of fwrite may buffer the output), in the kernel and even on the device itself - network writes may well be buffered by the device whilst waiting for bandwidth on the wire and hard-disk drives will buffer output from the OS to ensure that data isn't lost as the physical platters spin to the correct position for the write.

Unexpected behavior with AudioQueueServices callback while recording audio

I'm recording a continuous stream of data using AudioQueueServices. It is my understanding that the callback will only be called when the buffer fills with data. In practice, the first callback has a full buffer, the 2nd callback is 3/4 full, the 3rd callback is full, the 4th is 3/4 full, and so on. These buffers are 8000 packets (recording 8khz audio) - so I should be getting back 1s of audio to the callback each time. I've confirmed that my audio queue buffer size is correct (and is somewhat confirmed by the behavior). What am I doing wrong? Should I be doing something in the AudioQueueNewInput with a different RunLoop? I tried but this didn't seem to make a difference...
By the way, if I run in the debugger, each callback is full with 8000 samples - making me think this is a threading / timing thing.
Apparently, from discussing with others — and the lack of responses, this behavior is as designed (or broken but not likely to be fixed), even if improperly documented. The work around is to buffer your samples appropriately in the callback if necessary and not expect the buffer to be full. This isn't an issue at all if you are just writing the data to a file, but if you expect to operate on consistently sized blocks of audio data in the callback, you will have to assure this consistency yourself.