In my WebRTC application, OPUS codec has been used to compress the audio stream and I was wondering what is the minimum viable bandwidth that should be allocated for audio stream without jitter?
For Opus voice encoding, mono 16KHz sample rate:
6Kbps is a minimum, when voice is still recognizible
16Kbps is a medium - good enough
32Kbps is a maximum - you wont see big difference if encode at higher bitrate (higher than 32)
From what I tested a few hundred Kbps (bits, not bytes), approximately 300-400Kbps should be enough for good audio quality, not only voice, but music too. But more important is the network latency, which should be under 20-25ms.
For decent voice audio a tenth (30-40Kbps) should be enough. But this is for one peer only. The latency can be much higher but you'll hear small skips now and then, which should acceptable for conversations.
Related
I want to know why I normally transmit a 320240 resolution video and default my uplink traffic is at 1.5MB;
when I modify the SDP bandwidth limit, for example at 500kbps/s my video width and height are still 320240 and the frame rate is not reduced;
So what exactly is this reduced upside traffic?
= =
WebRTC uses so-called "lossy perceptual video compression." That is, the video is capable of being compressed into bit streams of various bandwidths... in your case 1.5mbps and 0.5mbps. It's like JPEG's quality parameter: in JPEG, adjusting that parameter changes the size of the image file. In video compression, instead of a quality parameter you request a bitrate.
When a lower-bitrate video stream is decompressed, it's a less faithful representation of the original. If you know what to look for you can see various compression artifacts in the decompressed imagery" blockiness, "mosquitoes" around objects, and so forth.
Streaming video and DVD video programs (cinema) use high bandwidth to minimize these effects below the threshold of perception at 1080p or 4K resolution.
In your SIF (320x240) resolution case, your decoded 0.5mbps video has more artifacts in it than your 1.5mbps video. But, because the resolution is relatively low, it will take some looking to find those artifacts. If they don't annoy you or your users, you can conclude that 0.5mbps is fine for your application. Long experience suggests that you should succeed just fine with that bitrate and resolution. You can even try 250kbps.
Reducing the frame rate doesn't proportionally save bandwidth; most compressed video frames represent differences from the previous frame.
Lower bitrates are better for mobile devices; they save power and your users' data plans.
If you want to see exaggerated compression artifacts and what they look like, set the bitrate down to 125kbps or lower.
Is the bit rate of black screen shown when video is muted same as the original video's bit rate or is it significantly less because it is just a black screen?
It is significantly less. Since there is essentially no video information being sent to the remote party. How much depends on a lot of factors (connection quality etc).
I just did a quick test and the outgoing bit rate at 640x480 # 27fps was around 900 kbps to 1 mbps. Disabling the video track of the stream resulted in an outgoing bitrate of 30 kbps.
Please keep in mind that this was only a simple test I did. You can get this kind of information yourself by evaluating the reports in peerConnection.getStats
Some documentation and libraries for getStats
https://www.w3.org/TR/webrtc-stats
https://github.com/muaz-khan/getStats
https://www.callstats.io/2015/07/06/basics-webrtc-getstats-api/
Came across chrome://webrtc-internals, which has inbuilt tracking for bit rates and has other good features.
As seen in graph, bit rate before video was muted was ~150k which reduces to ~30k on muting the video.
I am using WebRTC for developing one of my applications.
There is no clarity on whether WebRTC natively supports adaptive bitrate streaming of video packets? Does VP8 / VP9 have adaptive bitrate encoding support? Is bitrate_controller WebRTC's implementation of ABR?
Can anyone please throw more light on this? I find no conclusive evidence that WebRTC natively supports Adaptive streaming for Video.
Based on the WebRTC documentation found on this website: https://hpbn.co/webrtc/#audio-opus-and-video-vp8-bitrates I found this:
When requesting audio and video from the browser, pay careful
attention to the size and quality of the streams. While the hardware
may be capable of capturing HD quality streams, the CPU and bandwidth
must be able to keep up! Current WebRTC implementations use Opus and
VP8 codecs:
The Opus codec is used for audio and supports constant and variable bitrate encoding and requires 6–510 Kbit/s of bandwidth. The good
news is that the codec can switch seamlessly and adapt to variable
bandwidth.
The VP8 codec used for video encoding also requires 100–2,000+ Kbit/s of bandwidth, and the bitrate depends on the quality of the
streams: 720p at 30 FPS: 1.0~2.0 Mbps 360p at 30 FPS: 0.5~1.0 Mbps
180p at 30 FPS: 0.1~0.5 Mbps
As a result, a single-party HD call can require up to 2.5+ Mbps of
network bandwidth. Add a few more peers, and the quality must drop to
account for the extra bandwidth and CPU, GPU, and memory processing
requirements.
So as far as I understand it, both codec will adapt the audio and video stream to the available bandwidth. Hope this helps.
I'm trying to generate high quality voice-over using Microsoft Speech API. What kind of values I should pass in to this constructor to guarantee high quality audio?
The .wav file will be used latter to feed FFmpeg, so audio will be re-encoded latter to a more compact form. My main goal is keep the voice as clear as I can, but I really don't know which values guarantee the best quality perceived by humans.
First of all, just to let you know I haven't used this Speech API, I'll give you an answer based on my Audio processing work.....
You can choose EncodingFormat.Pcm for Pulse Code Modulation
samplesPerSecond is sampling frequency. Because it is voice you can cover it with 16000hz for sure. If you are really perfectionist you can go with 22050 for example. Higher the value is, the audio file size will be larger. If file size isn't a problem you can even go with 32000 or 44100 but there won't be much noticable difference....
bitsPerSample - go with 16 if possible
1 or 2, mono or stereo ..... it won't affect the quality of the sound
averageBytesPerSecond ..... this would be samplesPerSecond*bytesPerSample (for example 22050*2)
blockAlign ..... this would be Bytes Per Sample*numberOfChanels (for example if you have 16bit PCM Mono audio, 16bits are 2 bytes, Mono is 1, so blockAlign is 2*1)
That last one, the byte array doesn't speaks much for itself, I'm not sure what it serves for, I believe the first 6 arguments are enough for audio to be generated.
I hope this was helpful
Cheers
which algorithm used for voice compression mailing and decompression
One of the best voice codecs out there is G.729, which is currently owned and licensed by Digium (http://www.digium.com), makers of Asterisk. It is an extremely effective low-loss audio codec intended for vocal wavelengths (hence why it's used for VoIP telephony). It also handles jitter very well (the change in height of latency over time), and uses only 8kb/s.
These days everybody from Skype to Google to phone companies do it with mp3. The minimum bitrate is a bit high but you get excellent audio for that bitrate. And the tech is proven and solid.
But apart from mp3 there are other, typically old-school, low bandwidth codecs which have been used to encode the human voice. The standard in telephony used to be u-law (that's mu-law) and A-law. A fairy recent speech specific codec that looks interesting is Speex