h264 high profile video: playback from specified point - indexing

What is the proper and fast way to start streaming/playback of h264 high profile HDTV video dump from the specific point?
Huge sample of the real life stream: this file.
According to 'ffprobe -show_frames' this sample 10Gb 105 minutes video dump has only 28 video frames marked as 'key_frame=1' and 10 I-frames.
Application which I am trying to improve uses such frames as some kind of index, allowing to rewind and play from any key-frame or I-frame.
It works perfectly with other streams. But not in this case, as you can easily understand. Only 28 starting points of playback in 100+ minutes of show is far too low.
I've checked the presence of packets with 'random-access-indicator' enabled - but such packets in this stream aren't on frame boundaries, they don't have 'frame begin' bit enabled, so I can't rely on them.
Is there a way at all to implement 'rewind/pause/play from the specified time point' feature for this codec?

Solved by interpretation as index frames the ones which contain NAL sequences 'nal slice idr' and 'nal slice pps'.

Related

What video encoding format supports variable bit rate streaming (quality) and a SEEK function to skip segments?

I'm looking for a Multiplatform video format that allows for a variable bit rate for streaming over high or low quality links, and also a seek function that allows a byte[] range offset (as in HTTP commands) to fetch a missing or lower than desired quality.
I think it is worth separating encoding and streaming to help the background.
Most streaming protocols will stream video contained in 'containers' e.g. MP4, WebM etc. The containers will include video tracks which can be encoded with different encoders, e.g. H.264, H.265, VP9 etc.
The term Variable bit rate is usually used to describe how the encoding is done - i.e. the encoder may compress or encode the video so it has variable quality and fixed bit rate or try to maintain a given quality level but with a variable bit rate.
I suspect what you may be more interested in is what is called Adaptive Bit Rate streaming - here the video is 'transcoded' into multiple copies, each with a different bit rate. The copies are all segmented at the same points, for example every two seconds.
A client can choose which bit rate to request for the next segment of the video, i.e. the next 2 second chunk, depending on its capabilities and on the network conditions at that time. Hence, the bit rate actually being streamed to the device can vary over time. See this answer for how you can view this in action in live examples: https://stackoverflow.com/a/42365034/334402
Assuming this meets your needs, then the two dominant ABR streaming formats at the moment are HLS and MPEG-DASH.
Traditionally HLS uses TS segments while DASH uses fragmented MP4. Both are now converging on CMAF, which means that the bulk of the media will be a single Multiplatform format, as you are looking for. There is a good overview of CMAF here at the time of writing: https://developer.apple.com/documentation/http_live_streaming/about_the_common_media_application_format_with_http_live_streaming
One caveat is that if your content is encrypted then, at the moment, different devices support different encryption modes so you may need to have separate HLS and DASH encrypted media for the medium term, until device support evolves over time.

WebRtc Bandwidth / resolution / frame rate control

I want to know why I normally transmit a 320240 resolution video and default my uplink traffic is at 1.5MB;
when I modify the SDP bandwidth limit, for example at 500kbps/s my video width and height are still 320240 and the frame rate is not reduced;
So what exactly is this reduced upside traffic?
= =
WebRTC uses so-called "lossy perceptual video compression." That is, the video is capable of being compressed into bit streams of various bandwidths... in your case 1.5mbps and 0.5mbps. It's like JPEG's quality parameter: in JPEG, adjusting that parameter changes the size of the image file. In video compression, instead of a quality parameter you request a bitrate.
When a lower-bitrate video stream is decompressed, it's a less faithful representation of the original. If you know what to look for you can see various compression artifacts in the decompressed imagery" blockiness, "mosquitoes" around objects, and so forth.
Streaming video and DVD video programs (cinema) use high bandwidth to minimize these effects below the threshold of perception at 1080p or 4K resolution.
In your SIF (320x240) resolution case, your decoded 0.5mbps video has more artifacts in it than your 1.5mbps video. But, because the resolution is relatively low, it will take some looking to find those artifacts. If they don't annoy you or your users, you can conclude that 0.5mbps is fine for your application. Long experience suggests that you should succeed just fine with that bitrate and resolution. You can even try 250kbps.
Reducing the frame rate doesn't proportionally save bandwidth; most compressed video frames represent differences from the previous frame.
Lower bitrates are better for mobile devices; they save power and your users' data plans.
If you want to see exaggerated compression artifacts and what they look like, set the bitrate down to 125kbps or lower.

WebRTC : Video black screen bitrate

Is the bit rate of black screen shown when video is muted same as the original video's bit rate or is it significantly less because it is just a black screen?
It is significantly less. Since there is essentially no video information being sent to the remote party. How much depends on a lot of factors (connection quality etc).
I just did a quick test and the outgoing bit rate at 640x480 # 27fps was around 900 kbps to 1 mbps. Disabling the video track of the stream resulted in an outgoing bitrate of 30 kbps.
Please keep in mind that this was only a simple test I did. You can get this kind of information yourself by evaluating the reports in peerConnection.getStats
Some documentation and libraries for getStats
https://www.w3.org/TR/webrtc-stats
https://github.com/muaz-khan/getStats
https://www.callstats.io/2015/07/06/basics-webrtc-getstats-api/
Came across chrome://webrtc-internals, which has inbuilt tracking for bit rates and has other good features.
As seen in graph, bit rate before video was muted was ~150k which reduces to ~30k on muting the video.

Frame by frame decode using Media Source Extension

I've been digging through the Media Source Extension examples on the internet and haven't quite figured out a way to adapt them to my needs.
I'm looking to take a locally cached MP4/WebM video (w/ 100% keyframes and 1:1 ratio of clusters/atoms to keyframes) and decode/display them non-sequentially (ie. frame 10, 400, 2, 100, etc.) and to be able to render these non-sequential frames on demand at rates from 0-60fps. The simple non-MSE approach using the currentTime property fails due to the latency in setting this property and getting a frame displayed.
I realize this is totally outside normal expectations for video playback, but my application requires this type of non-sequential high speed playback. Ideally I can do this with h264 for GPU acceleration but I realize there could be some platform specific GPU buffers to contend with, though it seems that a zero frame buffer should be possible (see here). I am hoping that MSE can accomplish this non-sequential high framerate low latency playback, but I know I'm asking for a lot.
Questions:
Will appendBuffer accept a single WebM cluster / MP4 Atom made up of a single keyframe, and also be able to decode at a high frequency (60fps)?
Do you think what I'm trying to do is possible in the browser?
Any help, insight, or code suggestions/examples would be much appreciated.
Thanks!
Update 4/5/16
I was able to get MSE mostly working with single frame MP4 fragments in Firefox, Edge, and Chrome. However, Chrome seems to be running into the frame buffer issue linked above and I haven't found a way to pre-process a MP4 to invoke this "low delay" mode. Anyone have any clues if it's possible to create such a file with an existing tool like MP4Box?
Firefox and Edge decode/display the individual frames immediately with very little latency, but of course something breaks once I load this video into a Three.js WebGL project (no video output, no errors). I'm ignoring this for now as I'd much rather have things working on Chrome as I'll be targeting Android as well.
I was able to get this working pretty well. The key was getting Chrome to enter its "low delay" mode by muxing a specially crafted MP4 file using modified mp4box sources. I added one line in movie_fragments.c so it read:
if (movie->moov->mvex->mehd && movie->moov->mvex->mehd->fragment_duration) {
trex->track->Header->duration = 0;
Media_SetDuration(trex->track);
movie->moov->mvex->mehd->fragment_duration = 0;
}
Now every MP4 created will have the MEHD fragment duration set to 0 which causes Chrome to process it as a live stream.
I still have one remaining issue related to the timestampOffset property which in combination with the FPS set in the media fragments control the playback speed. Since I'm looking to control the FPS directly I don't want any added delay from the MSE playback engine. I'll post a separate question here to address that.
Thanks,
Dustin

What characteristics should have a .wav file as result of TTS engine to be be listened with high quality?

I'm trying to generate high quality voice-over using Microsoft Speech API. What kind of values I should pass in to this constructor to guarantee high quality audio?
The .wav file will be used latter to feed FFmpeg, so audio will be re-encoded latter to a more compact form. My main goal is keep the voice as clear as I can, but I really don't know which values guarantee the best quality perceived by humans.
First of all, just to let you know I haven't used this Speech API, I'll give you an answer based on my Audio processing work.....
You can choose EncodingFormat.Pcm for Pulse Code Modulation
samplesPerSecond is sampling frequency. Because it is voice you can cover it with 16000hz for sure. If you are really perfectionist you can go with 22050 for example. Higher the value is, the audio file size will be larger. If file size isn't a problem you can even go with 32000 or 44100 but there won't be much noticable difference....
bitsPerSample - go with 16 if possible
1 or 2, mono or stereo ..... it won't affect the quality of the sound
averageBytesPerSecond ..... this would be samplesPerSecond*bytesPerSample (for example 22050*2)
blockAlign ..... this would be Bytes Per Sample*numberOfChanels (for example if you have 16bit PCM Mono audio, 16bits are 2 bytes, Mono is 1, so blockAlign is 2*1)
That last one, the byte array doesn't speaks much for itself, I'm not sure what it serves for, I believe the first 6 arguments are enough for audio to be generated.
I hope this was helpful
Cheers