What video encoding format supports variable bit rate streaming (quality) and a SEEK function to skip segments? - safari

I'm looking for a Multiplatform video format that allows for a variable bit rate for streaming over high or low quality links, and also a seek function that allows a byte[] range offset (as in HTTP commands) to fetch a missing or lower than desired quality.

I think it is worth separating encoding and streaming to help the background.
Most streaming protocols will stream video contained in 'containers' e.g. MP4, WebM etc. The containers will include video tracks which can be encoded with different encoders, e.g. H.264, H.265, VP9 etc.
The term Variable bit rate is usually used to describe how the encoding is done - i.e. the encoder may compress or encode the video so it has variable quality and fixed bit rate or try to maintain a given quality level but with a variable bit rate.
I suspect what you may be more interested in is what is called Adaptive Bit Rate streaming - here the video is 'transcoded' into multiple copies, each with a different bit rate. The copies are all segmented at the same points, for example every two seconds.
A client can choose which bit rate to request for the next segment of the video, i.e. the next 2 second chunk, depending on its capabilities and on the network conditions at that time. Hence, the bit rate actually being streamed to the device can vary over time. See this answer for how you can view this in action in live examples: https://stackoverflow.com/a/42365034/334402
Assuming this meets your needs, then the two dominant ABR streaming formats at the moment are HLS and MPEG-DASH.
Traditionally HLS uses TS segments while DASH uses fragmented MP4. Both are now converging on CMAF, which means that the bulk of the media will be a single Multiplatform format, as you are looking for. There is a good overview of CMAF here at the time of writing: https://developer.apple.com/documentation/http_live_streaming/about_the_common_media_application_format_with_http_live_streaming
One caveat is that if your content is encrypted then, at the moment, different devices support different encryption modes so you may need to have separate HLS and DASH encrypted media for the medium term, until device support evolves over time.

Related

WebRtc Bandwidth / resolution / frame rate control

I want to know why I normally transmit a 320240 resolution video and default my uplink traffic is at 1.5MB;
when I modify the SDP bandwidth limit, for example at 500kbps/s my video width and height are still 320240 and the frame rate is not reduced;
So what exactly is this reduced upside traffic?
= =
WebRTC uses so-called "lossy perceptual video compression." That is, the video is capable of being compressed into bit streams of various bandwidths... in your case 1.5mbps and 0.5mbps. It's like JPEG's quality parameter: in JPEG, adjusting that parameter changes the size of the image file. In video compression, instead of a quality parameter you request a bitrate.
When a lower-bitrate video stream is decompressed, it's a less faithful representation of the original. If you know what to look for you can see various compression artifacts in the decompressed imagery" blockiness, "mosquitoes" around objects, and so forth.
Streaming video and DVD video programs (cinema) use high bandwidth to minimize these effects below the threshold of perception at 1080p or 4K resolution.
In your SIF (320x240) resolution case, your decoded 0.5mbps video has more artifacts in it than your 1.5mbps video. But, because the resolution is relatively low, it will take some looking to find those artifacts. If they don't annoy you or your users, you can conclude that 0.5mbps is fine for your application. Long experience suggests that you should succeed just fine with that bitrate and resolution. You can even try 250kbps.
Reducing the frame rate doesn't proportionally save bandwidth; most compressed video frames represent differences from the previous frame.
Lower bitrates are better for mobile devices; they save power and your users' data plans.
If you want to see exaggerated compression artifacts and what they look like, set the bitrate down to 125kbps or lower.

What characteristics should have a .wav file as result of TTS engine to be be listened with high quality?

I'm trying to generate high quality voice-over using Microsoft Speech API. What kind of values I should pass in to this constructor to guarantee high quality audio?
The .wav file will be used latter to feed FFmpeg, so audio will be re-encoded latter to a more compact form. My main goal is keep the voice as clear as I can, but I really don't know which values guarantee the best quality perceived by humans.
First of all, just to let you know I haven't used this Speech API, I'll give you an answer based on my Audio processing work.....
You can choose EncodingFormat.Pcm for Pulse Code Modulation
samplesPerSecond is sampling frequency. Because it is voice you can cover it with 16000hz for sure. If you are really perfectionist you can go with 22050 for example. Higher the value is, the audio file size will be larger. If file size isn't a problem you can even go with 32000 or 44100 but there won't be much noticable difference....
bitsPerSample - go with 16 if possible
1 or 2, mono or stereo ..... it won't affect the quality of the sound
averageBytesPerSecond ..... this would be samplesPerSecond*bytesPerSample (for example 22050*2)
blockAlign ..... this would be Bytes Per Sample*numberOfChanels (for example if you have 16bit PCM Mono audio, 16bits are 2 bytes, Mono is 1, so blockAlign is 2*1)
That last one, the byte array doesn't speaks much for itself, I'm not sure what it serves for, I believe the first 6 arguments are enough for audio to be generated.
I hope this was helpful
Cheers

h264 high profile video: playback from specified point

What is the proper and fast way to start streaming/playback of h264 high profile HDTV video dump from the specific point?
Huge sample of the real life stream: this file.
According to 'ffprobe -show_frames' this sample 10Gb 105 minutes video dump has only 28 video frames marked as 'key_frame=1' and 10 I-frames.
Application which I am trying to improve uses such frames as some kind of index, allowing to rewind and play from any key-frame or I-frame.
It works perfectly with other streams. But not in this case, as you can easily understand. Only 28 starting points of playback in 100+ minutes of show is far too low.
I've checked the presence of packets with 'random-access-indicator' enabled - but such packets in this stream aren't on frame boundaries, they don't have 'frame begin' bit enabled, so I can't rely on them.
Is there a way at all to implement 'rewind/pause/play from the specified time point' feature for this codec?
Solved by interpretation as index frames the ones which contain NAL sequences 'nal slice idr' and 'nal slice pps'.

core audio: is zero equivalent to silence only for PCM audio?

I'm trying to create a basic algorithm that does packet loss concealment for core audio. I simply want to replace the missing data with silence.. in the book learning core audio, the author says that in lossless PCM, zeros mean silence. I was wondering if I'm playing VBR (ie compressed data), would putting zeros suffice for silence as well?
In my existing code.. when I plug zeros into the audio queue.. it suddenly jams (ie it no longer frees up consumed data in the audio queue callback..) and i'm wondering why
PCM is the raw encoded sample. All 0 (when using signed data for samples) is indeed silence. (In fact, all of any value is silence, but such a DC offset has the potential to damage your amplifier and/or speakers, if it isn't filtered out.)
When you compress with a lossy codec, you enter a digital format where it is not trivial to just add silence. Think of adding data to a ZIP file to add null bytes to the end of a file. It isn't as simple as just inserting them arbitrarily into the ZIP file.
If you want to add silence to a compressed file, you must do so using the appropriate codec. Then, you have to fit it into the bitstream, which is also not trivial. Usually the stream is broken up by frames, but you can't even split on those frames in some formats. MP3 and AAC use a bit reservoir where unused data in prior frames can be used to encode more complicated frames later on, making splitting the file very difficult.

ZyXEL ADPCM codec

I have a ZyXEL USB Omni56K Duo modem and want to send and receive voice streams on it, but to reach adequate quality I probably need to implement some "ZyXEL ADPCM" encoding because plain PCM provides too small sampling rate to transmit even medium quality voice, and it doesn't work through USB either (probably because even this bitrate is too high for USB-Serial converter in it).
This mysterious codec figures in all Microsoft WAV-related libraries as one of many codecs theoretically supported by it, but I found no implementations.
Can someone offer an implementation in any language or maybe some documentation? Writing a custom mu-law decoding algorithm won't be a problem for me.
Thanks.
I'm not sure how ZyXEL ADPCM varies from other flavors of ADPCM, but various ADPCM implementations can be found with some google searches.
However, the real reason for my post is why the choice of ADPCM. ADPCM is adaptive differential pulse-code modulation. This means that the data being passed is the difference in samples, not the current value (which is also why you see such great compression). In a clean environment with no bit loss (ie disk drive), this is fine. However, in a streaming environment, its generally assumed that bits may be periodically mangled. Any bit damage to the data and you'll be hearing static or other audio artifacts very quickly and usually, fairly badly.
ADPCM's reset mechanism isn't framed based, which means the audio problems can go on for an extended period of time depending on the encoder. The reset code is a usually a set of 0s (16 comes to mind, but its been years since I wrote my own ports).
ADPCM in the telephony environment usually converts a 12 bit PCM sample to a 4 bit ADPCM sample (not bad). As for audio quality...not bad for phone conversations and the spoken word, but most people, in a blind test, can easily detect the quality drop.
In your last sentence, you throw a curve ball into the question. You start mentioning muLaw. muLaw is a PCM implementation that takes a 12 bit sample and transforms it using a logarithmic scale to an 8 bit sample. This is the typical compression mechanism for TDM (phone) networkworks in North America (most of the rest of the world uses a similar algorithm called ALaw).
So, I'm confused what you are actually trying to find.
You also mentioned Microsft and WAV implementations. You probably know, but just in case, that WAV is just a wrapper around the audio data that provides format, sampling information, channel, size and other useful information. Without WAV, AU or other wrappers involved, muLaw and ADPCM are usually presented as raw data.
One other tip if you are implementing ADPCM. As I indicated, they use 4 bits to represent a 12 bit sample. They get away with this by both sides having a multiplier table. Your position in the table changes based on the 4 bit value (in other words, the value is both multiple against a step size and used to figure out the new step size). I've seen a variety of algorithms use slightly different tables (no idea why, but you typically see the sent and received signals slowly stray off the bias). One of the older, popular sound packages was different than what I typically saw from the telephony hardware vendors.
And, for more useless trivia, there are multiple flavors of ADPCM. The variances involve the table, source sample size and destination sample size, but I've never had a need to work with them. Just documented flavors that I've found when I did my internet search for specifications for the various audio formats used in telephony.
Piping your pcm through ffmpeg -f u16le -i - -f wav -acodec adpcm_ms - will likely work.
http://ffmpeg.org/