Does Google webrtc native implementation have support for SFU? - webrtc

Does Google WebRTC Native implementation has support for SFU?
Does Google WebRTC Native implementation support for integrating custom/hardware encoder/decoder?

Not without alteration.
Internally WebRTC's internal audio/video pipelines are directly tied to encoder/decoders.
PeerConnectionFactory allows you to provide a video decoder/encoder factory, so you can short circuit the logic here, and grab the encoded frames, mock up a stream, and feed them directly into it as a relay, creating a new PeerConnection and setting those streams onto it.
The audio end is more difficult. There isn't a codec factory, so you will have to short circuit the logic there probably by alteration of libwebrtc.
The final question is RTCP termination, and how to override the mechanisms for quality/bandwidth control to not create a "One goes out, they all go out." situation.
Since libwebrtc will be the SFU, it will receive RTCP feedback from its remote peer for the content it is proxying, and vice versa.
For a 1-1 situation, it needs to be able to forward the RTCP feedback to the remote peer.
For multipoint, it needs to perform some logic to determine if one of the peers is problematic, and stop sending it video, switch off its video feed, or attempt to switch to a lower bitrate video stream. Basically it needs to act as a conduit that attempts to predict why/how packet loss is occurring, and keep as many audio/video feeds operating normally at at the highest possible quality for each peer.
How exactly to hijack the RTCP feedback mechanisms in libwebrtc, I think that again will likely require some customization/hooks into libwebrtc

I think it will be easier to try with GStreamer implementation of WebRTC. Although it is still in "Bad Plugins" it is way easier to get or provide encoded audio and video. Actually it is implemented in that in mind - to make implementation of MFU and SFU easier.

Related

ApiRTC - Media always sent to the cloud, even with meshOnlyEnabled

As a follow-up to my previous post (ApiRTC - Behaviour with meshModeEnabled and meshOnlyEnabled)
Hello,
You say that SFU is necessary for any activity that requires centralizing all the streams (recording, bandwidth optimization,...). However, in MESH mode, the files/media exchanged still manage to be recorded on the Apizee media server even though I don't go through the SFU. How is this possible ?
Can this behaviour be disabled so that the exchanged documents never leave the MESH stream ?
I have not found anything about this in the documentation.
By the way, the documentation often mentions the term "MCU", does this mean that ApiRTC also uses an MCU server in addition to the SFU ?
Thanks in advance.
apirtc
Can this behaviour be disabled so that the exchanged documents never
leave the MESH stream ?
Concerning a recording of all the streams in the conversation (via the startRecording method of the Conversation object see https://apirtc.github.io/references/apirtc-js/Conversation.html#startRecording__anchor):
--> The composition of multiple streams into one video file is done server-side by the SFU (v4.4.8).
Concerning the files (through conversation.pushData method):
--> We manage the file transfer through uploading the file on a storage and share the URI to all parties of a conversation. P2P transfer is not available (v4.4.8)
To exchange data in a P2P mode, you can use the Conversation.sendData method to send raw data across all participants.
Regarding your question about the MCU, no, ApiRTC doesnt use any MCU server to date (v. 4.4.8). The document refers to MCU for very specific on-premise deployment, not supported for ApiRTC users.
Cheers,
Romain

Stream html5 camera output

does anyone know how to stream html5 camera output to other users.
If that's possible should I use sockets, images and stream them to the users or other technology.
Is there any video tutorial where I can take a look about it.
Many thanks.
The two most common approaches now are most likely:
stream from the source to a server, and allow users connect to the server to stream to their devices, typically using some form of Adaptive Bit Rate streaming protocol (ABR - basically creates multiple bit rate versions of your content and chunks them, so the client can choose the next chunk from the best bit rate for the device and current network conditions).
Stream peer to peer, or via a conferencing hub, using WebRTC
In general, the latter is more focused towards real time, e.g. any delay should be below the threshold which would interfere with audio and video conferences, usually less than 200ms for audio for example. To achieve this it may have to sacrifice quality sometimes, especially video quality.
There are some good WebRTC samples available online (here at the time of writing): https://webrtc.github.io/samples/

Server side real time analysis of video streamed from client

I'm trying to build a system for real-time analysis on server for video streamed from the client using WebRTC.
Here is what I currently have in mind. I would capture the webcam video stream from the client and send it (compressed using H.264?) to my server.
On my server, I would receive the stream and every raw frame to my C++ library for analysis.
The output of the analysis (box coordinates to draw) would then be sent back to the client via WebRTC or a separate WebSocket connection.
I've been looking online and found open-source media server like Kurento and Mediasoup but, since I only need to read the stream (no dispatch to other clients), do I really need to use an existing server? Or could I build it myself and if so, where to start?
I'm fairly new to the WebRTC and video streaming world in general so I was wondering, does this whole thing sound right to you?
That depends on how real-time your requirements are. If you want 30-60fps and near-realtime, getting the images to the server via RTP is the best solution. And then you'll need things like a jitter buffer, depacketization etc, video decoders, etc.
If you require only one image per second, grabbing it from the canvas and sending it via Websockets or HTTP POST is easier. https://webrtchacks.com/webrtc-cv-tensorflow/ shows how to do that in Python.

what are disadvantages of having two PeerConnections for one call?

I am thinking of changing my application from using a single PeerConnection for transferring media both ways to one PeerConnection for upstream and one for downstream for a single call between two peer.
The advantages I foresee:
Less worry about signalling state of PeerConnection when changing offering media from video+audio to audio and vice-versa
Might be easier to plug an Media Servers like kurento into the application ( in case of multi user call, lesser upload bandwidth required by user).
(not sure of this one) single responsibility principle, each PeerConnection has single role.
the major reason I want to do this change is, I am noticing that if peer(peer1) offers only audio but other peer(peer2) answers with both video+audio, peer1 recieves only the audio for some reason, but if peer1 had been an answerer, it is able to recieve both MediaTracks without any problem. Not sure if it is a bug in my app or browser( got same result in firefox and chrome). I was able to make a workaround by maintaining states, changing offerer based on state and stuff, but having problems with both peers changing state (nearly) simultaneously. Thought above proposal would be simpler solution and I can get rid of maintaining states.
Other than the obvious disadvantages of extra overhead of more ICE candidate requests( n STUN n TURN), maintaining extra PeerConnections, any other issue possible following this design?
Nothing prevents you from doing that, but I suspect there's a simpler solution to your problem which you kind of buried:
the major reason I want to do this change is, I am noticing that if peer(peer1) offers only audio but other peer(peer2) answers with both video+audio, peer1 recieves only the audio for some reason,
Don't ask me why, but the default spec behavior when peer1 only offers audio, is to only request audio from the other side. To override this and leave yourself open to receiving video as well if the other side has it, use RTCOfferOptions:
peer1.createOffer({ offerToReceiveVideo: true }).then( ... )
(or if you're using the legacy non-promise API it's the third argument.)
The nice thing with this is that it is intent-based so you don't need to track any state. e.g. always using { offerToReceiveVideo: true, offerToReceiveAudio: true } may be right for you.
A resource issue would be that you are be utilizing more ports as both sides of the connection have to complete the DTLS handshake(which is done peer-to-peer and not through the signalling server).
A design challenge is keeping track of two connections orthogonally. It could be hairy and will more readily show errors in the underlying webrtc implementation if the state is not handled properly(browser state errors, etc.).

Play audio stream using WebAudio API

I have a client/server audio synthesizer where the server (java) dynamically generates an audio stream (Ogg/Vorbis) to be rendered by the client using an HTML5 audio element. Users can tweak various parameters and the server immediately alters the output accordingly. Unfortunately the audio element buffers (prefetches) very aggressively so changes made by the user won't be heard until minutes later, literally.
Trying to disable preload has no effect, and apparently this setting is only 'advisory' so there's no guarantee that it's behavior would be consistent across browsers.
I've been reading everything that I can find on WebRTC and the evolving WebAudio API and it seems like all of the pieces I need are there but I don't know if it's possible to connect them up the way I'd like to.
I looked at RTCPeerConnection, it does provide low latency but it brings in a lot of baggage that I don't want or need (STUN, ICE, offer/answer, etc) and currently it seems to only support a limited set of codecs, mostly geared towards voice. Also since the server side is in java I think I'd have to do a lot of work to teach it to 'speak' the various protocols and formats involved.
AudioContext.decodeAudioData works great for a static sample, but not for a stream since it doesn't process the incoming data until it's consumed the entire stream.
What I want is the exact functionality of the audio tag (i.e. HTMLAudioElement) without any buffering. If I could somehow create a MediaStream object that uses the server URL for its input then I could create a MediaStreamAudioSourceNode and send that output to context.destination. This is not very different than what AudioContext.decodeAudioData already does, except that method creates a static buffer, not a stream.
I would like to keep the Ogg/Vorbis compression and eventually use other codecs, but one thing that I may try next is to send raw PCM and build audio buffers on the fly, just as if they were being generated programatically by javascript code. But again, I think all of the parts already exist, and if there's any way to leverage that I would be most thrilled to know about it!
Thanks in advance,
Joe
How are you getting on ? Did you resolve this question ? I am solving a similar challenge. On the browser side I'm using web audio API which has nice ways to render streaming input audio data, and nodejs on the server side using web sockets as the middleware to send the browser streaming PCM buffers.