WasapiLoopbackCapture to WaveOut - naudio

I'm using WasapiLoopbackCapture to capture sound coming from my speakers and then using onDataAvailable to send it to another device and I'm attempting to play the data sent using the WaveOut class and a BufferedWaveProvider and just adding a sample everytime data is sent from my client using the onDataAvailable. I'm having problems sending sound. The most functioning I've managed to get it is:
Not syncing the Wave format of the client and the server, just sending data and adding it to the sample. Problem is this is stutters very much even though I checked the buffer stored size and it has 51 seconds. I even have to increase the buffer size which eventually overflows anyway.
I tried syncing the Wave format and I just get clicks but have no problem with buffer size. I also tried making sure that at least a second was stored in the buffer but that had zero effect.
If anyone could point me in the right direction that would be great.

Uncompressed audio takes up a lot of space on a network. On my machine the WasapiLoopbackCapture object produces 32-bit (IeeeFloat) stereo samples at 44100 samples per second, for around 2.7Mbit/sec total raw bandwidth. Once you factor in TCP packet overheads and so on, that's quite a lot of data you're transferring.
The first thing I would suggest though is that you plug in some profiling code at each step in the process to get an idea of where your bottlenecks are happening. How fast is data arriving from the capture device? How big are your packets? How long does it take to service each call to your OnDataAvailable event handler? How much data are you sending per second across the network? How fast is the data arriving at the client? Figure out where the bottlenecks are and you get a much better idea of what the bottlenecks are.
Try building a simulated server that reads data from a wave file in various WaveFormats (channels, bits per sample and sample rate) and simulates sending that data across the network to the client. You might find that the problem goes away at lower bandwidth. And if bandwidth is the issue, compression might be the solution.
If you're using a single-threaded model, and servicing each OnDataAvailable event takes longer than the recording frequency (ie: number of expected calls to OnDataAvailable per second) then there's going to be a data loss issue. Multiple threads can help with this - one to get the data from the audio system, another to process and send the data. But you can end up in the same position: losing data because you're not dealing with it quickly enough. When that happens it's handy to know about it, because it indicates a problem in the program. Find out when and where it happens - overflow in input, processing or output buffers all have different potential reasons and need different attention.

Related

how to prevent cpu usage from changing timing in labview?

I'm trying to write a code in which every 1 ms a number plused one , should be replaced the old number . (something like a chronometer ! ) .
the problem is whenever the cpu usage increases because of some other programs running on the pc, this 1 milliseconds is also increased and timing in my program changes !
is there any way to prevent cpu load changes affecting timing in my program ?
It sounds as though you are trying to generate an analogue output waveform with a digital-to-analogue converter card using software timing, where your software is responsible for determining what value should be output at any given time and updating the output accordingly.
This is OK for stationary or low-speed signals but you are trying to do it at 1 ms intervals, in other words to output 1000 samples per second or 1 ks/s. You cannot do this reliably on a desktop operating system - there are too many other processes going on which can use CPU time and block your program from running for many milliseconds (or even seconds, e.g. for network access).
Here are a few ways you could solve this:
Use buffered, hardware-clocked output if your analogue output device supports it. Instead of writing one sample at a time, you send the device a waveform or array of samples and it outputs them at regular intervals using a timing signal generated in hardware. Unfortunately, low-end DAQ devices often don't support hardware-clocked output.
Instead of expecting the loop that writes your samples to the AO to run every millisecond, read LabVIEW's Tick Count (ms) value in the loop and use that as an index to your array of samples: rather than trying to output every sample, your code will now say 'what time is it now, and therefore what should the output be?' That won't give you a perfect signal out but at least now it should keep the correct frequency rather than be 'slowed down' - instead you will see glitches imposed on the signal whenever the loop can't keep up. This is easy to test and maybe it will be adequate for your needs.
Use a real-time operating system instead of a desktop OS. In the case of LabVIEW this would mean using the Real-Time software module and either a National Instruments hardware device that supports RT, such as the CompactRIO series, or installing the RT OS on a dedicated PC if the hardware is compatible. This is not a cheap option, obviously (unless it's strictly for personal, home use). In any case you would need to have an RT-compatible driver for your output device.
Use your computer's sound output as the output device. LabVIEW has functions for buffered sound output and you should be able to get reliable results. You'll need to upsample your signal to one of the sound output's available sample rates, probably 44.1 ks/s. The drawbacks are that the output level is limited in range and is not calibrated, and will probably be AC-coupled so you can't output a DC or very low-frequency signal. However if the level is OK for what you want to connect it to, or you can add suitable signal conditioning, this could be a neat solution. If you need the output level to be calibrated you could simultaneously measure it with your DAQ card and scale the sound waveform you're outputting to keep it correct.
The answer to your question is "not on a desktop computer." This is why products like LabVIEW Real-Time and dedicated deterministic hardware exist: you need a computer built around dedication to a particular process in order to consistently serve that process. Every application in a regular Windows/Mac/Linux desktop system has the problem you are seeing of potentially being interrupted by other system processes, particularly in its UI layer.
There is no way to prevent cpu load changes from affecting timing in your program unless the computer has a realtime clock.
If it doesn't have a realtime clock, there is no reason to expect it to behave deterministically. Do you need for your program to run at that pace?

Optimal Sizes of data for sends and receives in MPI

I am writing a parallel application with MPI in which the master process has data of size approximately as large as the cache(4MB on the platform I am working on) to send over to each process. As 4MB might be too large for the master to send at a time, it is necessary that it break the entire data into smaller chunks of a certain size suitable for sending and receiving.
My question is, Is there any suggestion on what should be the optimal size for sending and receiving each smaller chunk given the size of the entire data?
Thanks.
4MB won't be any problem for any MPI implementation out there; I'm not sure what you mean by "too large" though.
A rule of thumb is that, if you can easily send the data all in one message, that is usually faster -- the reason being that there is some finite amount of time required to send and receive any one message (the latency) that comes from the function calls, calls to the transport layer, etc. On top of that, there is some, usually close-to-fixed amount of time it takes to send any additional byte of data (which is one over the bandwidth.) That's only a very crude approximation to the real complexity of sending messages (especially large messages) between processors, but it's a very useful approximation. Within that model, the fewer messages you send, the better, because you incur the latency overhead fewer times.
The above is almost always true if you are contemplating sending many little messages; however, if you're talking about sending (say) 4 1MB messages vs 1 4MB messages, even under that model the difference may be small, and may be overwhelmed by other effects specific to your transport. If you want a more accurate assessment of how long things take for your platform, there's really no substitute for empirical measurement of how long things actually take. The best way would just be to try it in your code a few ways and see what is best. That's really the only definitive answer. A second method would be to take a look at MPI "microbenchmarks":
The Intel MPI Benchmarks (IMB)
The Ohio State University MPI Benchmarks (OSU)
both of the above include benchmarks of how long it takes to send and receive messages of various sizes; you compile the above with your MPI and you can simply read off how long it takes to send/receive (say) a 4MB message vs 4x 1MB messages and that may give you some clues as to how to proceed.

Does restricting the frames/s of a game engine (via vertical sync or common throttling), inflict latencies also on audio and input subsystems?

I was contemplating the fact that inflicting frame per second restrictions is not ideal in regards to latency and performance since a monitor still has a chance to show something sooner (assuming no vertical sync which is mainly for stability).
However, it occurred to me that process might have ramifications on audio subsystems (or also input devices), assuming frames per second also governs the global loop of the engine itself (the case in almost all 3D accelerated applications).
Do audio cards/audio devices on an operating system have a concept of "Hertz" that might be related? Do we assume the faster the global loop of the application the better the latency for the audio subsystems?
Audio is not typically affected at all. The audio has a separate buffer which you fill up in advance, maybe a hundred milliseconds or more, and the sound card plays back from that regardless of what your game loop is doing.
It is possible, if you fill that buffer in your game loop, that taking too long to get back to the game loop will result in the sound buffer being empty and a looping sound being heard. To avoid this, developers will either use a big buffer or fill the buffer from a background thread.
As you might guess, audio is already running at some sort of latency, proportional to the amount of data in the buffer that the game attempts to keep in there at all times. This is usually not so noticeable since sound takes a non-negligible time to travel in real life anyway. Pro audio applications have to keep this buffer small for low latency and responsiveness, but they don't have graphical frames to worry about...
As far as input, yes, it is often affected. If a game does not decouple the rendering rate from the input handling rate, then there will be some additional latency on the way in. There will always be some additional perceived latency on the way out too, if you consider input latency as the length of time between performing an action and seeing it take effect. But that perceived latency may well be larger than the simulation latency, since the affected entity may have been altered at an earlier timestep than the one displayed in the next frame.
Typically the game processing and the display aren't coupled, so reducing the FPS of the display won't affect the central game processing, which would include things like input (and presumably audio).
This article explains it pretty well, including different options for having FPS linked to game speed or not.

How to dynamically allocate buffer for receiving UDP socket (VB.Net)

A friend and I are working on a project where we're required to build a reliable UDPclient/server using VB.Net. We have things working well, but one thing that still eludes us is how to dynamically allocate a (byte) buffer for the incoming data. Right now we have to hard code a maximum value/MTU (or use a really large buffer size and resize it once we've finished receiving). Does anyone know of a way that this can be done without needing to specify the receive buffer size?
Basically, before calling the receive function on the socket with a buffer of size x, we want to know x so we can allocate an appropriately sized buffer. Perhaps this is a problem in all socket programing that you just have to deal with??
This is one of the burden's you'll have to take on when you use UDP. You'll have to consider Path MTU discovery. Then again, since you are making reliable UDP, you should be able to auto-detect this and dynamically switch to a smaller packet size. That will solve PMTUD problems as well.
Hopefully, this doesn't sound too much like: "those whose don't use TCP are doomed to reinvent it." Check out the RFCs that are linked in that article for ideas.

Is there an ideal number of network operations for iPhone OS?

I'm using NSOperation and NSOperationQueue to handle all of my networking threads so my interface can remain responsive while handling data transfer over the internet. Currently, I've got my operation queue set to a maximum concurrent operation count of 5, and it seems to work well.
I'm wondering, though, if there is a more ideal number of concurrent network operations that would best maximize the available resources without choking the hardware. Are there any recommendations, or steps I might take to measure and find out for myself?
Given the iPhone (currently) runs a single core, I would guess 5 is around the right number.
But the only way to be sure would be to instrument it and find out what the usage looks like (CPU, Memory and Network). Network usage you could get based on the data transferred - but its hard to know what a reaspnable usage would be. I'm not sure if it is possible to get CPU/Memory statistics from the iPhone.
If you are doing large transfers, then more connections probably wont help much. If you are doing lots of small transfers, then more connections will help work around the back and forth of setting up and tearing down the connection.