which algorithm used for voice compression mailing and decompression
One of the best voice codecs out there is G.729, which is currently owned and licensed by Digium (http://www.digium.com), makers of Asterisk. It is an extremely effective low-loss audio codec intended for vocal wavelengths (hence why it's used for VoIP telephony). It also handles jitter very well (the change in height of latency over time), and uses only 8kb/s.
These days everybody from Skype to Google to phone companies do it with mp3. The minimum bitrate is a bit high but you get excellent audio for that bitrate. And the tech is proven and solid.
But apart from mp3 there are other, typically old-school, low bandwidth codecs which have been used to encode the human voice. The standard in telephony used to be u-law (that's mu-law) and A-law. A fairy recent speech specific codec that looks interesting is Speex
Related
I'm looking for a Multiplatform video format that allows for a variable bit rate for streaming over high or low quality links, and also a seek function that allows a byte[] range offset (as in HTTP commands) to fetch a missing or lower than desired quality.
I think it is worth separating encoding and streaming to help the background.
Most streaming protocols will stream video contained in 'containers' e.g. MP4, WebM etc. The containers will include video tracks which can be encoded with different encoders, e.g. H.264, H.265, VP9 etc.
The term Variable bit rate is usually used to describe how the encoding is done - i.e. the encoder may compress or encode the video so it has variable quality and fixed bit rate or try to maintain a given quality level but with a variable bit rate.
I suspect what you may be more interested in is what is called Adaptive Bit Rate streaming - here the video is 'transcoded' into multiple copies, each with a different bit rate. The copies are all segmented at the same points, for example every two seconds.
A client can choose which bit rate to request for the next segment of the video, i.e. the next 2 second chunk, depending on its capabilities and on the network conditions at that time. Hence, the bit rate actually being streamed to the device can vary over time. See this answer for how you can view this in action in live examples: https://stackoverflow.com/a/42365034/334402
Assuming this meets your needs, then the two dominant ABR streaming formats at the moment are HLS and MPEG-DASH.
Traditionally HLS uses TS segments while DASH uses fragmented MP4. Both are now converging on CMAF, which means that the bulk of the media will be a single Multiplatform format, as you are looking for. There is a good overview of CMAF here at the time of writing: https://developer.apple.com/documentation/http_live_streaming/about_the_common_media_application_format_with_http_live_streaming
One caveat is that if your content is encrypted then, at the moment, different devices support different encryption modes so you may need to have separate HLS and DASH encrypted media for the medium term, until device support evolves over time.
In my WebRTC application, OPUS codec has been used to compress the audio stream and I was wondering what is the minimum viable bandwidth that should be allocated for audio stream without jitter?
For Opus voice encoding, mono 16KHz sample rate:
6Kbps is a minimum, when voice is still recognizible
16Kbps is a medium - good enough
32Kbps is a maximum - you wont see big difference if encode at higher bitrate (higher than 32)
From what I tested a few hundred Kbps (bits, not bytes), approximately 300-400Kbps should be enough for good audio quality, not only voice, but music too. But more important is the network latency, which should be under 20-25ms.
For decent voice audio a tenth (30-40Kbps) should be enough. But this is for one peer only. The latency can be much higher but you'll hear small skips now and then, which should acceptable for conversations.
I would like to know if there's any controller such as arduino or any other microcontroller that can be programmed to run VLC player embedded in its system. It is probably the best open source player. It would be nice if it could run on a standalone controller, and just plug in your usb to the controller and play videos.
The barebone mini systems are way too expensive around 200 to 400 dollars, and that would be an easy approach, but not cost effective.Thanks for reading.
Generally speaking, no, as most "microcontrollers" lack the memory (or external memory bus) and horsepower needed to do software video decoding.
That generally is a task which falls more to "system on a chip" (SOC) designs, which today are increasingly packaged with hundreds of megabytes of memory stacked on top of a several hundred MHz processor, which may have additional special function hardware acceleration. Things like the beaglebone family, raspberry pi, and recent set top boxes and smart phones, and of course pocket cameras would be examples.
Note that some of the SOC based boards are not really any more expensive than an Arduino, especially by the time you add I/O shields to the latter. That's because they are able to leverage modern high density integration and the economies of scale of the consumer-device chip marketplace, to inexpensively put a lot of functionality on one or two chips, which would be far, far more expensive to crudely duplicate using a lot of physically discrete parts in the manner of an Arduino + accessories solution. And an Arduino is so many orders of magnitude too slow that the first accessory you would have to add to it would be a stand-alone hardware video decoding IC.
I agree with Chris.
Microcontrollers doesn't have enough memory to decode videos. You need to select some microprocessor with video processing available. On the other side you can get some cheap processors with android capability.They are available for 35-40 $. And gives smooth HDMI output. (Not sure about Plug into USB)
The barebone mini systems are way too expensive around 200 to 400
dollars, and that would be an easy approach, but not cost effective.
Raspberry Pi, about $30 give or take. Beaglebone black $45, white $89. pcDuino Lite, $39, pcDuino Dev $59, I could do this all day...
As everyone has already said, you wont port such a heavily operating system dependent program to a microcontroller, for a number of reasons, memory, processor requirements, video, and so on.
if you could say take a stm32f4 or something on the high end of microcontrollers, and create some video player for specific format or formats, the man hours involved would take a fair number of sales to overcome the cost. Why spend a few months on a project when you can have a raspberry pi or beaglebone shipped in a couple of days? (or a Roku or Apple TV at a local store).
My game is based on Flash and uses RTMP to deliver live video to players. Video should be streamed from single location to many clients, not between clients.
It's essential requirement that end-to-end video stream should have very low latency, less than 0.5s.
Using many tweaks on server and client, I was able to achieve approx. 0.2s latency with RTMP and Adobe Live Media Encoder in case of loopback network interface.
Now the problem is to port the project to Windows 8 store app. Natively Windows 8 offers smooth streaming extensions for IIS + http://playerframework.codeplex.com/ for player + video encoder compatible with live smooth streaming. As of encoder, now I tested only Microsoft Expression Encoder 4 that supports live smooth streaming.
Despite using msRealTime property on player side, the latency is huge and I was unable to make it less than 6-10 seconds by tweaking the encoder. Different sources state that smooth [live] streaming is not a choice for low-latency video streaming scenarios, and it seems that with Expression Encoder 4 it's impossible to achieve low latency with any combination of settings. There are hardware video encoders which support smooth streaming, like ones from envivio or digital rapids, however:
They are expensive
I'm not sure at all if they can significantly improve latency on encoder side, compared to Expression Encoder
Even if they can eliminate encoder's time, can the rest of smooth streaming (IIS side) support required speed.
Questions:
What technology could be used to stream to Win8 clients with subsecond latency, if any?
Do you know players compatible with win8 or easily portable to win8 which support rtmp?
Addition. Live translation of Build 2012 uses Rtmp and Smooth Streaming in Desktop mode. In Metro mode, it uses RTMP and Flash Player for Metro.
I can confirm that Smooth Streaming will not be your technology of choice here. Under the very best scenario with perfect conditions, the best you're going to get is a few seconds (absolute minimum latency would be the chunk length itself, even if everything else had 0 latency.)
I think most likely RTSP/RTMP or something similar using UDP is your best bet. I would be looking at Video Conferencing technologies more than wide audience streaming technologies. If I remember correctly there are a few .NET components out there to handle RTSP H.264 meant for video conferencing - if I can find them later I will post here.
I have a ZyXEL USB Omni56K Duo modem and want to send and receive voice streams on it, but to reach adequate quality I probably need to implement some "ZyXEL ADPCM" encoding because plain PCM provides too small sampling rate to transmit even medium quality voice, and it doesn't work through USB either (probably because even this bitrate is too high for USB-Serial converter in it).
This mysterious codec figures in all Microsoft WAV-related libraries as one of many codecs theoretically supported by it, but I found no implementations.
Can someone offer an implementation in any language or maybe some documentation? Writing a custom mu-law decoding algorithm won't be a problem for me.
Thanks.
I'm not sure how ZyXEL ADPCM varies from other flavors of ADPCM, but various ADPCM implementations can be found with some google searches.
However, the real reason for my post is why the choice of ADPCM. ADPCM is adaptive differential pulse-code modulation. This means that the data being passed is the difference in samples, not the current value (which is also why you see such great compression). In a clean environment with no bit loss (ie disk drive), this is fine. However, in a streaming environment, its generally assumed that bits may be periodically mangled. Any bit damage to the data and you'll be hearing static or other audio artifacts very quickly and usually, fairly badly.
ADPCM's reset mechanism isn't framed based, which means the audio problems can go on for an extended period of time depending on the encoder. The reset code is a usually a set of 0s (16 comes to mind, but its been years since I wrote my own ports).
ADPCM in the telephony environment usually converts a 12 bit PCM sample to a 4 bit ADPCM sample (not bad). As for audio quality...not bad for phone conversations and the spoken word, but most people, in a blind test, can easily detect the quality drop.
In your last sentence, you throw a curve ball into the question. You start mentioning muLaw. muLaw is a PCM implementation that takes a 12 bit sample and transforms it using a logarithmic scale to an 8 bit sample. This is the typical compression mechanism for TDM (phone) networkworks in North America (most of the rest of the world uses a similar algorithm called ALaw).
So, I'm confused what you are actually trying to find.
You also mentioned Microsft and WAV implementations. You probably know, but just in case, that WAV is just a wrapper around the audio data that provides format, sampling information, channel, size and other useful information. Without WAV, AU or other wrappers involved, muLaw and ADPCM are usually presented as raw data.
One other tip if you are implementing ADPCM. As I indicated, they use 4 bits to represent a 12 bit sample. They get away with this by both sides having a multiplier table. Your position in the table changes based on the 4 bit value (in other words, the value is both multiple against a step size and used to figure out the new step size). I've seen a variety of algorithms use slightly different tables (no idea why, but you typically see the sent and received signals slowly stray off the bias). One of the older, popular sound packages was different than what I typically saw from the telephony hardware vendors.
And, for more useless trivia, there are multiple flavors of ADPCM. The variances involve the table, source sample size and destination sample size, but I've never had a need to work with them. Just documented flavors that I've found when I did my internet search for specifications for the various audio formats used in telephony.
Piping your pcm through ffmpeg -f u16le -i - -f wav -acodec adpcm_ms - will likely work.
http://ffmpeg.org/