WebRTC: Bad Encoding Performance for Screensharing via CGDisplayStream (h264/vp8/vp9) - objective-c

I am using the objective-c framework for WebRTC for building a screensharing app. The video is captured using CGDisplayStream. I have a working demo but at 2580x1080 I get only 3-4 fps. My googAvgEncodeMs is around 100-300ms (should be >10ms ideally) which explains why the screensharing is far from being fluid (30fsp+). I also switched between codecs (h264/vp8/vp9) but with all of them I get the same slow experience. The contentType in webRTC is set to screen (values: [screen,realtime]).
The cpu usage of my mac is then between 80-100%. My guess is that there is some major optimisation (qpMax, hardware-acceleration etc...) in the c++ code of the codecs that I have missed. Unfortunately my knowledge on codecs is limited.
Also interesting: Even when I lower the resolution to 320x240 the googAvgEncodeMs is still in the range of 30-60ms.
I am running this on a MacBook Pro 15 inch from 2018. When running a random webrtc inside Chrome/Firefox etc I get smoother results than with the vanilla webrtc framework.

WebRTC uses software encoding and that is the real culprit. Also encoding 2580 x 1080 in software is not going to be practical. Try reducing H and V resolution in half and it will improve performance with some loss in quality. Also if you are doing screen sharing and video support is not critical, you can drop frame rate to 10 frames per second. Logical solution is to figure out how to incorporate h/w acceleration.

Related

Decreasing speed decreases sound quality

Decreasing the playback speed of AudioPlayer severely decreases the quality of the audio being played; the audio becomes very "noisy".
Is there any way to fix this or is it an issue with the just_audio implementation?
Reproduce:
final AudioPlayer player = AudioPlayer(); // Create audio player
player.setAsset("..."); // Load audio file
player.setSpeed(0.5); // Halve speed
player.play(); // Start playing
Just to preface this answer, time stretching is a difficult thing to do in real-time because it has to stretch time without stretching the sound waves (stretching the sound waves would lower the frequency and hence the pitch, so it has to stretch time while filling the gaps with fabricated extensions to the existing sound waves). As a result, the very best real time algorithm will still introduce artifacts and distortions.
Now to answer your question, just_audio doesn't provide any options to change the time stretching algorithm, but it does use the best available algorithms for each platform, for general purpose usage. The Android implementation uses Sonic which is better quality than Android's own built-in algorithm. On iOS/macOS, AVAudioTimePitchAlgorithmTimeDomain is used which seems to produce the least distortion at speeds below 1.0 out of the different algorithms Apple provides, although newer iPhones/iOS versions may produce higher quality output. On web browsers, it uses whatever algorithm that web browser provides.
If you need to try out alternatives, you would need to make a copy of just_audio and edit the code that selects the algorithm. You are unlikely to find better options for Android and web, but you might like to experiment with the different iOS/macOS algorithms by searching for AVAudioTimePitchAlgorithmTimeDomain in the code and changing it to one of the other options listed in Apple's documentation. You may find one of the other algorithms works better if you have a specialised use case.

Frame by frame decode using Media Source Extension

I've been digging through the Media Source Extension examples on the internet and haven't quite figured out a way to adapt them to my needs.
I'm looking to take a locally cached MP4/WebM video (w/ 100% keyframes and 1:1 ratio of clusters/atoms to keyframes) and decode/display them non-sequentially (ie. frame 10, 400, 2, 100, etc.) and to be able to render these non-sequential frames on demand at rates from 0-60fps. The simple non-MSE approach using the currentTime property fails due to the latency in setting this property and getting a frame displayed.
I realize this is totally outside normal expectations for video playback, but my application requires this type of non-sequential high speed playback. Ideally I can do this with h264 for GPU acceleration but I realize there could be some platform specific GPU buffers to contend with, though it seems that a zero frame buffer should be possible (see here). I am hoping that MSE can accomplish this non-sequential high framerate low latency playback, but I know I'm asking for a lot.
Questions:
Will appendBuffer accept a single WebM cluster / MP4 Atom made up of a single keyframe, and also be able to decode at a high frequency (60fps)?
Do you think what I'm trying to do is possible in the browser?
Any help, insight, or code suggestions/examples would be much appreciated.
Thanks!
Update 4/5/16
I was able to get MSE mostly working with single frame MP4 fragments in Firefox, Edge, and Chrome. However, Chrome seems to be running into the frame buffer issue linked above and I haven't found a way to pre-process a MP4 to invoke this "low delay" mode. Anyone have any clues if it's possible to create such a file with an existing tool like MP4Box?
Firefox and Edge decode/display the individual frames immediately with very little latency, but of course something breaks once I load this video into a Three.js WebGL project (no video output, no errors). I'm ignoring this for now as I'd much rather have things working on Chrome as I'll be targeting Android as well.
I was able to get this working pretty well. The key was getting Chrome to enter its "low delay" mode by muxing a specially crafted MP4 file using modified mp4box sources. I added one line in movie_fragments.c so it read:
if (movie->moov->mvex->mehd && movie->moov->mvex->mehd->fragment_duration) {
trex->track->Header->duration = 0;
Media_SetDuration(trex->track);
movie->moov->mvex->mehd->fragment_duration = 0;
}
Now every MP4 created will have the MEHD fragment duration set to 0 which causes Chrome to process it as a live stream.
I still have one remaining issue related to the timestampOffset property which in combination with the FPS set in the media fragments control the playback speed. Since I'm looking to control the FPS directly I don't want any added delay from the MSE playback engine. I'll post a separate question here to address that.
Thanks,
Dustin

Can VLC Player be embedded in a microcontroller to play videos?

I would like to know if there's any controller such as arduino or any other microcontroller that can be programmed to run VLC player embedded in its system. It is probably the best open source player. It would be nice if it could run on a standalone controller, and just plug in your usb to the controller and play videos.
The barebone mini systems are way too expensive around 200 to 400 dollars, and that would be an easy approach, but not cost effective.Thanks for reading.
Generally speaking, no, as most "microcontrollers" lack the memory (or external memory bus) and horsepower needed to do software video decoding.
That generally is a task which falls more to "system on a chip" (SOC) designs, which today are increasingly packaged with hundreds of megabytes of memory stacked on top of a several hundred MHz processor, which may have additional special function hardware acceleration. Things like the beaglebone family, raspberry pi, and recent set top boxes and smart phones, and of course pocket cameras would be examples.
Note that some of the SOC based boards are not really any more expensive than an Arduino, especially by the time you add I/O shields to the latter. That's because they are able to leverage modern high density integration and the economies of scale of the consumer-device chip marketplace, to inexpensively put a lot of functionality on one or two chips, which would be far, far more expensive to crudely duplicate using a lot of physically discrete parts in the manner of an Arduino + accessories solution. And an Arduino is so many orders of magnitude too slow that the first accessory you would have to add to it would be a stand-alone hardware video decoding IC.
I agree with Chris.
Microcontrollers doesn't have enough memory to decode videos. You need to select some microprocessor with video processing available. On the other side you can get some cheap processors with android capability.They are available for 35-40 $. And gives smooth HDMI output. (Not sure about Plug into USB)
The barebone mini systems are way too expensive around 200 to 400
dollars, and that would be an easy approach, but not cost effective.
Raspberry Pi, about $30 give or take. Beaglebone black $45, white $89. pcDuino Lite, $39, pcDuino Dev $59, I could do this all day...
As everyone has already said, you wont port such a heavily operating system dependent program to a microcontroller, for a number of reasons, memory, processor requirements, video, and so on.
if you could say take a stm32f4 or something on the high end of microcontrollers, and create some video player for specific format or formats, the man hours involved would take a fair number of sales to overcome the cost. Why spend a few months on a project when you can have a raspberry pi or beaglebone shipped in a couple of days? (or a Roku or Apple TV at a local store).

WebRTC - Peerconnection constraints

I've been working on a WebRTC videoconferencing app which is working great, taking into account the current state of WebRTC.
However, I have been exploring the possibilities to add constraints to the video and audio streams being send over by PeerConnection.
More specific in improving the performance of the video.
When videoconferencing on old (slow) laptops, we noticed that the quality of the image is really high but the frame per second is low. The stream is hacky.
About the audio quality, we give it a 8,5 for Chrome but only a 5,5 to 6 for Firefox.
I am not really interested in applying constraints to getUserMedia since this stream is being shown to the user aswell, and we don't want to change anything about this local output. (Unless there isn't another way)
I have found alot of information on W3G's drafts about MediaStreams and WebRTC itself.
These define certain constraints like default fps, minfps, minwidth and minheight of the image. On webrtc.org is also alot of information available like choosing codec etc.
But these settings can only be made "under the hood". It seems these settings cannot be addressed from RTCPeerConnection API level?
Certain examples on the net manipulate the SDP strings in the Offer / Answer part of the WebRTC handshake, is this the way to go ?
TL;DR : How to apply - and What is the best way to apply - constraints on WebRTC like minfps, maxfps, default fps, minwidth, maxwidth, dpi of image, bandwidth of video and audio, audio KHz and any other way to improve performance or quality of the stream(s).
Big thanks in advance !
Right now, most of those can't be set in Firefox or Chrome. A few can be adjusted (with care/pain) in the SDP, but even if there's an SDP option defined for something it doesn't mean that the browsers look at it.
Both Mozilla and Google are looking to improve CPU overload detection and reaction (reduce frame size dynamically, etc). Right now, this effectively isn't being done. Upcoming releases of FF (FF24) will adapt to the capture resolution (as a maximum), but we don't have constraints for that yet, just about:config prefs (see media.*). That would allow you to set a different default resolution for Firefox.

Windows 8 low latency video streaming

My game is based on Flash and uses RTMP to deliver live video to players. Video should be streamed from single location to many clients, not between clients.
It's essential requirement that end-to-end video stream should have very low latency, less than 0.5s.
Using many tweaks on server and client, I was able to achieve approx. 0.2s latency with RTMP and Adobe Live Media Encoder in case of loopback network interface.
Now the problem is to port the project to Windows 8 store app. Natively Windows 8 offers smooth streaming extensions for IIS + http://playerframework.codeplex.com/ for player + video encoder compatible with live smooth streaming. As of encoder, now I tested only Microsoft Expression Encoder 4 that supports live smooth streaming.
Despite using msRealTime property on player side, the latency is huge and I was unable to make it less than 6-10 seconds by tweaking the encoder. Different sources state that smooth [live] streaming is not a choice for low-latency video streaming scenarios, and it seems that with Expression Encoder 4 it's impossible to achieve low latency with any combination of settings. There are hardware video encoders which support smooth streaming, like ones from envivio or digital rapids, however:
They are expensive
I'm not sure at all if they can significantly improve latency on encoder side, compared to Expression Encoder
Even if they can eliminate encoder's time, can the rest of smooth streaming (IIS side) support required speed.
Questions:
What technology could be used to stream to Win8 clients with subsecond latency, if any?
Do you know players compatible with win8 or easily portable to win8 which support rtmp?
Addition. Live translation of Build 2012 uses Rtmp and Smooth Streaming in Desktop mode. In Metro mode, it uses RTMP and Flash Player for Metro.
I can confirm that Smooth Streaming will not be your technology of choice here. Under the very best scenario with perfect conditions, the best you're going to get is a few seconds (absolute minimum latency would be the chunk length itself, even if everything else had 0 latency.)
I think most likely RTSP/RTMP or something similar using UDP is your best bet. I would be looking at Video Conferencing technologies more than wide audience streaming technologies. If I remember correctly there are a few .NET components out there to handle RTSP H.264 meant for video conferencing - if I can find them later I will post here.