Streaming music over WebRTC cutting in and out

Streaming music over WebRTC cutting in and out - webrtc

We would like to be able to play music in another tab (say YouTube, Spotify, Soundcloud, etc) and then stream that over a WebRTC connection to other peers.
We are doing this through the screenshare and it's mostly working, but the music will sometimes cut in and out for the listeners, giving it a choppy sound. In other words, it sounds smooth to the person sending it (ie sharing it from the originating URL), but it sounds choppy to the on the receiving side of the WebRTC connection.
Any thoughts on what might be causing this? Is this a buffering issue? If so, is it more likely buffering on the sending or the receiving side?
Thanks so much for any help!

WebRTC favors low latency over quality, with the goal of ensuring you can have normal speech communication. To do this, a lot of things happen to your audio:
Playback rate is constantly changed. If playback gets behind, the rate speeds up. If it's too far ahead, it slows down.
There is a very small buffer, creating more opportunities for the playback buffer run dry.
If packets are lost, the audio for their time is simply discarded... skipped over. Playback isn't likely to buffer a bit and then continue.
When audio is lost, a bit of a trail-off is synthesized. This is fine for speech, but sounds bad for music.
On the media capture end, there are also audio "enhancements" designed for dealing with bad webcam microphones which can sometimes be applied to other mediastreams if configured incorrectly. These include:
Echo cancellation
Noise reduction
Automatic gain control
Finally, it's usually the case that audio bitrates are quite low by default. You'll usually have to munge the SDP if you want stereo high quality audio.
All this to say, WebRTC might not be the right choice for you if you are concerned with quality. I often resort to the MediaRecorder API.

Related

Media Source Extension Javascript API vis-a-vis WebRTC. Some questions

The closest I came across this is this question on SO but that is just for basic understanding.
My question is: when Media Source Extension (MSE) is used where the media source is fetched from a remote end point, for example, through AJAX or fetch API or even websocket, the media is sent over TCP.
That will handle packet loss and sequencing so protocol like RTP with RTCP is not used. Is that correct?
But this will result in delay so it cannot be truly used for real-time communication. Yes?
There is no security/encryption requirement for MSE like in WebRTC (DTLS/SRTP). Yes?
One cannot, for example, mix a remote audio source from MSE with an audio mediaStreamTrack from a RTCPeerConnection as they do not have any common param like CNAME (RTCP) or are part of the same mediastream). In other words, the world of MSE and WebRTC cannot mix unless synchronization is not important. Correct?

That will handle packet loss and sequencing so protocol like RTP with RTCP is not used. Is that correct?
AJAX and Fetch are just JavaScript APIs for making HTTP requests. Web Socket is just an API and protocol extended from an initial HTTP request. HTTP uses TCP. TCP takes care of ensuring packets arrive and arrive in-order. So, yes, you won't need to worry about packet loss and such, but not because of MSE.
But this will result in delay so it cannot be truly used for real-time communication. Yes?
That depends entirely on your goals. It's a myth that TCP isn't fast, or that TCP increases general latency for every packet. What is true is that the initial 3-way handshake takes a few round trips. It's also true that if a packet does actually get dropped, the application sees latency as suddenly sharply increased until the packet is requested again and sent again.
If your goals are something like a telephony application where the loss of a packet or two is meaningless overall, then UDP is more appropriate. (In voice communications, we talk slow enough that if a few milliseconds of sound go missing, we can still decipher what was being said. Our spoken language is robust enough that if entire words get garbled or are silent, we can figure out the gist of what was being said from context.) It's also important that immediate continuity be kept for voice communications. The tradeoff is that realtime-ness is better than accuracy at any particular instant/packet.
However, if you're doing something, say a one-way stream, you might choose a protocol over TCP. In this case, it may be important to be as realtime as possible, but more important that the audio/video don't glitch out. Consider the Super Bowl, or some other large sporting event. It's a live event and important that it stays realtime. However, if the time reference for the viewer is only 3-5 seconds delayed from live, it's still "live" enough for the viewer. The viewer would be far more angry if the video glitched out and they missed something happening in the game, rather than if they were just behind a few seconds. Since it's one-way streaming and there is no communication feedback loop, the tradeoff for reliability and quality over extreme low latency makes sense.
There is no security/encryption requirement for MSE like in WebRTC (DTLS/SRTP). Yes?
MSE doesn't know or care how you get your data.
One cannot, for example, mix a remote audio source from MSE with an audio mediaStreamTrack from a RTCPeerConnection as they do not have any common param like CNAME (RTCP) or are part of the same mediastream). In other words, the world of MSE and WebRTC cannot mix unless synchronization is not important. Correct?
Mix, where? Synchronization, where? No matter what you do, if you have streams coming from different places... or even different devices without sync/gen lock, they're out of sync. However, if you can define a point of reference where you consider things "synchronized", then it's all good. You could, for example, have independent streams going into a server and the server uses its current timestamps to set everything up and distribute together via WebRTC.
How you do this, or what you do, depends on the specifics of your application.

WebRTC : Video black screen bitrate

Is the bit rate of black screen shown when video is muted same as the original video's bit rate or is it significantly less because it is just a black screen?

It is significantly less. Since there is essentially no video information being sent to the remote party. How much depends on a lot of factors (connection quality etc).
I just did a quick test and the outgoing bit rate at 640x480 # 27fps was around 900 kbps to 1 mbps. Disabling the video track of the stream resulted in an outgoing bitrate of 30 kbps.
Please keep in mind that this was only a simple test I did. You can get this kind of information yourself by evaluating the reports in peerConnection.getStats
Some documentation and libraries for getStats
https://www.w3.org/TR/webrtc-stats
https://github.com/muaz-khan/getStats
https://www.callstats.io/2015/07/06/basics-webrtc-getstats-api/

Came across chrome://webrtc-internals, which has inbuilt tracking for bit rates and has other good features.
As seen in graph, bit rate before video was muted was ~150k which reduces to ~30k on muting the video.

Audio or Video Priority Constraints in vLine (WebRTC)

As I am developing a video chat application based on the vLine API (fantastic so far), I deal with a lot of high latency and lower bandwidth connections.
I know that a lot of this is abstracted to the browser doing the work behind the scenes, but I am trying to find out if one is able to prioritize audio over video, in regards to quality and bandwidth.
It's always better to be able to hear someone even if the video becomes poor. Are there any abilities to do this in WebRTC and vLine in particular? Ideally, I would like to implement a slider control or checkbox with pre-defined constraints.

Currently there is no way to prioritize audio over video via constraints in a video call. The browser will try to do the 'right thing' in this scenario.

Is OpenAL the right audio library to use for cross platform audio processing?

I am making an application that will do things like pitch shifting and time stretching to audio files, and play them back in real time. Is OpenAL the right library for this? Or is there something that could do this better, and would be easy to reuse for different platforms?

OpenAL can't do pitch shifting or time stretching. For that, you'll need a 3rd party library such as SoundTouch.
As well, OpenAL doesn't support realtime audio processing. You can kind of fake it using buffer queues, but it's a bit hokey because you'd need to keep polling to see when a buffer has finished playing and then queue the next processed buffer, and you'd need to keep your buffers very small or risk laggy audio response. However, small queued buffers can potentially lead to performance, timing, and clicking issues.

Is there a way to stream audio from MIC and play that stream in Silverlight

So I want to stream the audio from a mic using NAudio and then pass that stream to WCF which a Siverlight app can consume to broadcast the live audio sound. I want the latency to be as low as possible.
Any suggestions or if some one has already done it please point the source. Thanks in advance

what you are asking is certainly possible, but will be a fair amount of work to do.
NAudio can handle to capturing microphone audio.
At the Silverlight end you can play custom audio formats (in this case PCM) using a custom media element streaming source. See this one: http://code.msdn.microsoft.com/wavmss
I suspect latency would not be very good. You can reduce it by keeping the buffer sizes small. Also bear in mind that WAV is not a very efficient format to be sending over the network.

To have low latency as possible, you should use the netTcpBinding and stream your audio in binary format. I would use MemoryStream for this and try to play with the buffersize to figure out what the best performance is. Also, try checking audio formats for best performance. This also depends of the audio quality you expect.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas