I'm creating a kind of live video streaming application and using a number of different libraries. I'm using
NAudio for unpacking the audio stream as it comes in. I found on their discussion boards this thread which I utilised like so;
BufferedWaveProvider mybufferwp = null;
WaveOut wo = new WaveOut();
WaveFormat wf = new WaveFormat(16000, 1);
void MainWindow()
{
_audioClient = new AudioClient();
_audioClient.AudioFrameReady += _audioClient_AudioFrameReady;
_audioClient.Connect(parent.TempIp, parent.AudioPort);
mybufferwp = new BufferedWaveProvider(wf);
mybufferwp.BufferDuration = TimeSpan.FromMinutes(5);
wo.Init(mybufferwp);
wo.Play();
}
void _audioClient_AudioFrameReady(object sender, AudioFrameReadyEventArgs e)
{
if (mybufferwp != null)
{
mybufferwp.AddSamples(e.AudioFrame.AudioData, 0, e.AudioFrame.AudioData.Length);
}
}
My problem is that the audio is slightly delayed. Not by much granted but its noticeable and I was hoping there might be something I could do to get it more in sync with my video feed which is nearly perfectly live.
Extra Info
AudioClient is from Kinect Service which allows me to send and receive Kinect Camera Data.
The problem you are dealing with is called latency. There are two sources of latency in this system. The first is the size of the record buffer. The Kinect will be filling a buffer of audio and then raising its AudioFrameReady event. The larger the buffer size (in milliseconds), the longer the delay will be. I don't know if Kinect gives you an option to minimise the size of this buffer.
Then there is more buffering on the playback side. NAudio's default setup for WaveOut is two buffers of 100ms each - one is played back while the other is filled. This value is chosen for smooth playback- make the buffers too small and playback might stutter. However, it is fully configurable, so I'd suggest reducing the WaveOut buffer sizes until playback starts to break up.
Related
I'm using the WebUSB API in Chrome. I can pair with the device, claim an interface, and begin listening to an inbound interrupt endpoint that transfers three bytes every time I press a button, and three more when the button is released (it's a vendor-specific MIDI implementation).
The USBTransferInResult.data.buffer contains all of the bytes it should, except they are not provided transfer-wise. The bytes are being transferred one byte at a time, unless I do something to generate a bunch of data at the same time, in which case, there may be as many as three or four bytes in the same USBTransferInResult.
Note: The maximum packet size for this endpoint is 8. I've tried setting it to stuff like 1 and 256 with no effect.
If I concatenated all of the result buffers, I'd have the exact data I'm expecting, but surely the API should make each transfer (seemingly) atomic.
This could be the result of something funky that the vendor (Focusrite - it's a Novation product) does with their non-compliant MIDI implementation. I just assumed that the vendor would prefer to transfer each MIDI message as an atomic interrupt transfer (not three one-byte transfers in rapid succession), as it would simplify the driver and make it more robust. I cannot see the advantage of breaking these messages up.
Note: If I enable the experimental-usb-backend, my USB device stops appearing in the dialog (when requestDevice is invoked).
This is the code I'm testing it with:
let DEVICE = undefined;
const connect = async function() {
/* Initialize the device, assign it to the global variable,
claim Interface 1, then invoke `listen`. */
const filters = [{vendorId: 0x1235, productId: 0x0018}];
DEVICE = await navigator.usb.requestDevice({filters});
await DEVICE.open();
await DEVICE.selectConfiguration(1);
await DEVICE.claimInterface(1);
listen();
};
const listen = async function() {
/* Recursively, listen for each interrupt transfer from
Endpoint 4, asking for upto 8 bytes each time, and then
logging each transfer (as a regular array of numbers). */
const result = await DEVICE.transferIn(4, 8);
const data = new Uint8Array(result.data.buffer);
console.log(Array.from(data));
listen();
};
// Note: The are a few lines of UI code here that provide a
// button for invoking the `connect` function above, and
// another button that invokes the `close` method of
// the USB device.
Given this issue is not reproducible without the USB device, I don't want to report it as a bug, unless I'm sure that it is one. I was hoping somebody here could help me.
Have I misunderstood the way the WebUSB API works?
Is it reasonable to assume that the vendor may have intended to break MIDI messages into individual bytes?
On reflection, the way this works may be intentional.
The USB MIDI spec is very complicated, as it seeks to accommodate complex MIDI setups, which can constitute entire networks in their own right. The device I'm hacking (the Novation Twitch DJ controller) has no MIDI connectivity, so it would have been much easier for the Novation engineers to just pass each MIDI message as USB interrupt transfers.
As for way it streams the MIDI bytes as soon as they're ready, I'm assuming this simplified the hardware, and is intended to be interpreted like bytecode. Each MIDI message begins with a status byte that indicates the number of data bytes that will follow it (analogous to an opcode, followed by some immediates).
Note: Status bytes also have a leading 1, while data bytes have a leading 0, so they are easy to tell apart (and SysEx messages use specific start and end bytes).
In the end, it was simple enough to use the status bytes to indicate when to instantiate a new message, and what type it should be. I then implemented a set of MIDI message classes (NoteOn, Control, SysEx etc) that each know when they have the right number of bytes (to simplify the logic for each individual message).
I have an issue using naudio for a project of mine - most likely I just overlooked a tiny error, but I don't catch it, so may I kindly ask for some help.
I am working on a project to receive (and further work with) an audio stream I receive over a network. The stream is encoded with G.711 a-law using 8kHz and 8bit and sent in tiny pieces of 20ms (or 50 packets per second).
The following code receives the stream via UDP (basically whenever a UDP packet is received, it is read from the socket and added to the naudio BufferedWaveProvider:
Private Provider As New NAudio.Wave.BufferedWaveProvider(NAudio.Wave.WaveFormat.CreateALawFormat(8000, 1))
Private Sub FT636VOIP_U_Auslesen(ByVal ar As IAsyncResult)
sample = FT636VOIPUSocket.EndReceive(ar, New Net.IPEndPoint("10.48.11.43", 60001))
Provider.AddSamples(sample, 0, sample.Count)
FT636VOIPUSocket.BeginReceive(New AsyncCallback(AddressOf FT636VOIP_U_Auslesen), FT636VOIPUSocket)
End Sub
Being started in another thread (to avoid blocking the main application), a WaveOutEvent is linked with the BufferedWaveProvider for playback.
Private Sub Audio()
Dim wo As New NAudio.Wave.WaveOutEvent
wo.DesiredLatency = 1000
wo.Init(Provider)
wo.Play()
Do While wo.PlaybackState = NAudio.Wave.PlaybackState.Playing
Threading.Thread.Sleep(500)
Loop
End Sub
Well, the network connection is up and quickly filling the Buffer and the playback starts after the desired latency but only creates a 'choppy sound', though essentially there should only be silence...
Do I have to decode the stream at some stage (though the BufferedWaveProvider is initialized with the correct coded? Or do I miss something else...
You will get best results if you decode the audio as it arrives and put it into the BufferedWaveProvider as 16 bit audio. Also, are you sure that there is no surrounding metadata in the network packets being received? If so that needs to be stripped out or it will result in noise.
The NAudio demo project contains an example of this exact scenario, so you can use that as a reference if you need further help.
So, I have what I think is a fairly interesting and, hopefully, not intractable problem. I have an audio/video getUserMedia stream that I am recording in Chrome. Individually, the tracks record perfectly well. However, when attempting to record both, one blocks the main thread, hosing the other. I know that there is a way to resolve this. Muaz Khan has a few demos that seem to work without blocking.
Audio is recorded via the web audio API. I am piping the audio track into a processor node which converts it to a 16b mono channel and streams it to a node.js server.
Video is recorded via the usual canvas hack and Whammy.js. When recording, video frames are drawn to a canvas and then the resulting image data is pushed into a frames array which is later encoded into a webm container by Whammy, subsequently uploaded to the node.js server.
The two are then muxed together via ffmpeg server-side and the result stored.
The ideas I've had so far are:
Delegate one to a worker thread. Unfortunately both the canvas and the stream are members of the DOM as far as I know.
Install headless browser in node.js and establish an rtc connection with the client, thereby exposing the entire stream server-side
The entire situation will eventually be solved by Audio Worker implementation. The working group seems to have stalled public progress updates on that while things are shuffled around a bit though.
Any suggestions for resolving the thread blocking?
Web Audio Connections:
var context = new AudioContext();
var source = context.createMediaStreamSource(stream);
var node = context.createScriptProcessor(2048, 1, 1);
node.onaudioprocess = audioProcess;
source.connect(node);
node.connect(context.destination);
Web Audio Processing:
if (!recording.audio) return;
var leftChannel = e.inputBuffer.getChannelData(0);
Socket.emit('record-audio', convertFloat32ToInt16(leftChannel));
Video Frame Buffering:
if (recording.video) {
players.canvas.context.fillRect(0, 0, players.video.width, players.video.height);
players.canvas.context.drawImage(players.video.element, 0, 0, players.video.width, players.video.height);
frames.push({
duration: 100,
image: players.canvas.element.toDataURL('image/webp')
});
lastTime = new Date().getTime();
requestAnimationFrame(drawFrame);
} else {
requestAnimationFrame(getBlob);
}
Update: I've since managed to stop the two operations from completely blocking one another, but it's still doing it enough to distort my audio.
There are a few key things that allow for successful getUserMedia recording in Chrome at the moment, as taken from a conglomeration of information gleaned from the helpful comments attached to the original question and my own experience.
When harvesting data from the recording canvas, encode as jpeg. I had been attempting webp to satisfy the requirements of Whammy.js. Generating a webp dataURI is apparently a cycle hog.
Delegate as much of the non-DOM operations as possible to worker threads. This is especially true of any streaming / upload operations (e.g., audio sample streaming via websockets)
Avoid requestAnimationFrame as a means of facilitating recording canvas drawing. It is resource intensive, and as Aldel has pointed out, can fail if the user switches tabs. Using setInterval is much more efficient/reliable. It also allows for better framerate control.
For Chrome at least, avoid client-side AV encoding for the time being. Stream audio samples and video frames server-side for processing. While client-side AV encoding libraries are very cool, they simply don't seem efficient enough for production quite yet.
Also, for Node.js ffmpeg automation, I highly recommend fluent-ffmpeg. Special thanks to Benjamin Trent for some practical examples.
#aldel is right. Increasing bufferSize value fixes it. E.g. bufferSize= 16384;
Try this demo in chrome and record audio+video. You'll hear clear recorded WAV in parallel with 720p video frames.
BTW, I agree with jesup that MediaRecorder solutions should be preferred.
Chromium guys are very close and hoping M47/48 will bring MediaRecorder implementations! At least for video (vp8) recordings.
There is chrome-based alternative for whammy.js as well:
https://github.com/streamproc/MediaStreamRecorder/issues/43
Working with a usb audio device (its a HID with multiple channels) that constantly outputs data.
What I'm hoping to achieve is to ignore the audio until a signal comes in from the device. At that point I would start monitoring the feed. A second signal from the device would indicate that I can go back to ignoring the data. I have opened said device in non-blocking mode so it won't interfere with other USB signals coming in.
This is working fine except that when I start reading the data (via snd_pcm_readi) I get an EPIPE error indicating a buffer overrun. This can be fixed by calling snd_pcm_prepare every time but I'm hoping there is a way to let the buffer empty while Im ignoring it.
I've looked at snd_pcm_drain and snd_pcm_drop but these stop the PCM and I'd rather keep it open.
Suggestions?
To ignore buffer overruns, change the PCM device's software parameters: set the stop threshold to the same value as the boundary.
With that configuration, overruns will not cause the device to stop, but will let it continue to fill the buffer.
(Other errors will still stop the device; it would be hard to continue when a USB device is unplugged ...)
When an overrun has happened, the buffer will contain more data than actually can fit into it, i.e., snd_pcm_avail will report more available frames than the buffer size.
When you want to actually start recording, you should call snd_pcm_forward to throw away all those invalid frames.
Per Apple’s “Polling Versus Run-Loop Scheduling”:
[hasSpace/BytesAvailable] can mean that there is available bytes or space or that the only way to find out is to attempt a read or a write operation (which could lead to a momentary block).
The doc does not explicitly state that hasSpace/BytesAvailable events behave the same way, only, obscurely, that they have “identical semantics."
Am I to conclude that a write/read streamError or a bytes read/written return of less than the amount expected could be due to a “momentary block”?
If so, should I attempt the transmission again? Should I use some sort of timer mechanism to give the blockage a chance to clear? This would be a lot of work to implement, so I’d rather not if it’s unlikely to help.
(It’s tempting to initiate a limited polling loop in such a case, say a while loop that makes 10 attempts, but I don’t know if it’s safe to do that at the same time as the stream is scheduled in the run loop, and I have no way to test it.)
Here is a good wrapper for sockets: https://github.com/robbiehanson/CocoaAsyncSocket
It will queue reads and writes if the connection is not available. You don't mention if you're using UDP or TCP, however I suspect you're using TCP, in which case it will handle any interruptions on its own -- provided the connection doesn't get torn down.
It’s been a long haul. Here’s some followup on this issue:
Early on, I threw out the idea of maintaining and checking a leftover cache because that would have worked only for the output stream, when further reflection suggested that the input stream could also become blocked.
Instead, I set up idling while-loops:
- (void) stream:(NSStream *)theStream handleEvent:(NSStreamEvent)eventCode {
switch (eventCode)
// RECEIVING
case NSStreamEventHasBytesAvailable: {
if (self.receiveStage == kNothingToReceive)
return;
// Get the data from the stream. (This method returns NO if bytesRead < 1.)
if (![self receiveDataViaStream:(NSInputStream *)theStream]) {
// If nothing was actually read, consider the stream to be idling.
self.bStreamIn_isIdling = YES;
// Repeatedly retry read, until (1) the read is successful, or (2) stopNetwork is called, which will clear the idler.
// (Just in case, add nil stream property as a loop breaker.)
while (self.bStreamIn_isIdling && self.streamIn) {
if ([self receiveDataViaStream:(NSInputStream *)theStream]) {
self.bStreamIn_isIdling = NO;
// The stream will have started up again; prepare for next event call.
[self assessTransmissionStage_uponReadSuccess];
}
}
}
else
// Prepare for what happens next.
[self assessTransmissionStage_uponReadSuccess];
break;
// SENDING
case NSStreamEventHasSpaceAvailable:
if (self.sendStage == kNothingToSend)
return;
if (![self sendDataViaStream:(NSOutputStream *)theStream]) {
self.bStreamOut_isIdling = YES;
while (self.bStreamOut_isIdling && self.streamOut) {
if ([self sendDataViaStream:(NSOutputStream *)theStream]) {
self.bStreamOut_isIdling = NO;
[self assessTransmissionStage_uponWriteSuccess];
}
}
}
else
[self assessTransmissionStage_uponWriteSuccess];
break;
// other event cases…
Then it came time to test a user-initiated cancellation, via a “cancel” button. Midway through the sync, there’s a pause on the Cocoa side, awaiting user input. If the user cancels at this point, the Cocoa app closes the streams and removes them from the runloop, so I expected that the streams on the other side of the connection would generate NSStreamEventEndEncountered events, or perhaps NSStreamEventErrorOccurred. But, no, only one event came through, an NSStreamEventHasBytesAvailable! Go figure.
Of course, there weren’t really any “bytes available,” as the stream had been closed on the Cocoa side, not written to — so the stream handler on the iOS side went into an infinite loop. Not so good.
Next I tested what would happen if one of the devices went to sleep. During the pause for user input, I let the iPhone sleep via auto-lock*, and then supplied the user input on the Cocoa side. Surprise again: the Cocoa app continued without perturbation to the end of the sync, and when I woke up the iPhone, the iOS app proved to have completed its side of the sync too.
Could there have been a hiccup on the iPhone side that was fixed by my idle loop? I threw in a stop-network routine to check:
if (![self receiveDataViaStream:(NSInputStream *)theStream])
[self stopNetwork]; // closes the streams, etc.
The sync still ran through to completion. There was no hiccup.
Finally, I tested what happened if the Mac (the Cocoa side) went to sleep during that pause for input. This produced a sort of backward belch: Two NSStreamEventErrorOccurred events were received — on the Mac side, after which it was no longer possible to write to the output stream. No events at all were received on the iPhone side, but if I tested the iPhone's stream status, it would return 5, NSStreamStatusAtEnd.
CONCLUSIONS & PLAN:
The "temporary block" is something of a unicorn. Either the network runs smoothly or it disconnects altogether.
If there is truly such a thing as a temporary block, there is no way to distinguish it from a complete disconnection. The only stream-status constants that seem logical for a temporary block are are NSStreamStatusAtEnd and NSStreamStatusError. But per the above experiments, these indicate disconnection.
As a result of which I’m discarding the while-loops and am detecting disconnection solely by checking for bytesRead/Written < 1.
*The iPhone won’t ever sleep if it’s slaved to Xcode. You have to run it straight from the iPhone.
You can anticipate "disconnection" whenever you attempt to write 0 bytes to the output stream, or when you receive 0 bytes on the input stream. If you want to keep the streams alive, make sure you check the length of bytes you're writing to the output stream. That way, the input stream never receives 0 bytes, which triggers the event handler for closed streams.
There's no such thing as an "idling" output stream. Only an idling provider of bytes to the output stream, which doesn't need to indicate its idleness.
If you're getting disconnected from your network connection by the sleep timer, you can disable that when you open your streams, and then disable it when you close them:
- (void)stream:(NSStream *)aStream handleEvent:(NSStreamEvent)eventCode {
switch(eventCode) {
case NSStreamEventOpenCompleted:
{
[UIApplication sharedApplication].idleTimerDisabled = YES;
break;
}
case NSStreamEventEndEncountered:
{
[UIApplication sharedApplication].idleTimerDisabled = NO;
break;
}
}
}
I wouldn't delve any further into the specifics of your situation because I can tell right-off-the-bat that you aren't completely clear on what streams are, exactly. I understand that the documentation on streams is really poor at priming newbies, and is scant, to-boot; but, they model the same streams that have been around for 30 years, so any documentation on streams for any operating system (except Windows) will work perfectly at bringing you up to speed.
By the way, the other, inextricable part of streams is your network connection code, which you did not supply. I suggest that, if you're not already using NSNetService and NSNetServiceBrowser to find peers, connect to them, and acquire your streams accordingly, you should. Doing so allows you to easily monitor the state of your network connection, and quickly and easily reopen your streams should they closed unexpectedly.
I have very thorough, yet easy-to-follow sample code for this, which would require no customization on your end at all to use, if anyone would like it.