Transfer perframe metadata from webrtc client to browser - webrtc

I know this question was already asked number of times a long time ago,but always remained unanswered.
I have webrtc client which transmits stream trough server(flashphoner) to browser. I need the way to mark specific frames by 4byte label on client side and parse this label in browser using js code.
Other theoretical ability is to add textual/qrcode watermarks and parse it on browser side using some ocr or qrparser library. The problem that i dont know how it possible to access decoded frame data on browser side. Any suggestions?

While something like this hasn't been possible in the past, the WebRTC Insertable Streams / Encoded Transform (specification) API allows this but browser support varies.
https://webrtc.github.io/samples/src/content/insertable-streams/endtoend-encryption/ shows a sample that a trivial XOR encryption and, more important for your use-case, adds a four-byte checksum.

Related

WebRTC - receiving H264 key frames

I've been playing with WebRTC using libdatachannel, experimenting and learning.
Wrote some code to parse RTP packets into NALU's, and testing connecting to a "known good" server which sends H264 video.
Problem:
I'm only seeing NALU's with type = 1 (fragmented into multiple FU-A's) and sometimes type = 24 (which contain embedded SPS and PPS NALU's).
So I don't understand how to decode / render this stream - I would expect the server to send a NALU with a key frame (NALU type 5) automatically to a newly connected client, but it does not.
What am I missing to be able to decode the stream? What should I do to receive a key frame quickly? If my understanding is correct, I need a key frame to start decoding / rendering.
Tried requesting a key frame from code - it does arrive (type 5) but after some delay which is undesirable.
And yet the stream plays perfectly fine with a web browser client (Chrome, JavaScript) and starts up quickly.
Am I maybe overthinking this, and the browser also has a delay but I'm just perceiving it as instant?
In any case, what's the situation with key frames? Is a client supposed to request them (and without that, a server should not be expected to send them)?
If so what's a good interval? One second, two, three?

Does Karate support WebSocket continuation frames?

We just found out that our WebApi returns trough the WebSocket protocol the message in chunks. These are continuation frames as per RFC6455 specification. While we listen only the first frame is retrieved from Karate and the others lost. This is always a string with 4082 chars length and around 16k bytes size. Is there a way to make Karate listen until the final frame is received so the whole message can be evaluated?
Here is a visualization from Fiddler showing how the frames are received:
Dev of Karate here. We are interested in closing gaps if any, so would you be able to help us get a sample WebSocket endpoint, maybe public or some sample code for us to replicate.
Karate uses Netty as the underlying library. A quick search tells me that it would be easy to support continuation frames if we don't already. Feel free to open a feature request to discuss further.

Server side real time analysis of video streamed from client

I'm trying to build a system for real-time analysis on server for video streamed from the client using WebRTC.
Here is what I currently have in mind. I would capture the webcam video stream from the client and send it (compressed using H.264?) to my server.
On my server, I would receive the stream and every raw frame to my C++ library for analysis.
The output of the analysis (box coordinates to draw) would then be sent back to the client via WebRTC or a separate WebSocket connection.
I've been looking online and found open-source media server like Kurento and Mediasoup but, since I only need to read the stream (no dispatch to other clients), do I really need to use an existing server? Or could I build it myself and if so, where to start?
I'm fairly new to the WebRTC and video streaming world in general so I was wondering, does this whole thing sound right to you?
That depends on how real-time your requirements are. If you want 30-60fps and near-realtime, getting the images to the server via RTP is the best solution. And then you'll need things like a jitter buffer, depacketization etc, video decoders, etc.
If you require only one image per second, grabbing it from the canvas and sending it via Websockets or HTTP POST is easier. https://webrtchacks.com/webrtc-cv-tensorflow/ shows how to do that in Python.

How to change dynamic video resolution during a call (in WebRTC)

I have been using SimpleWebRTC lib for my project.
How to change dynamic remote video resolution during a call (like google hangout when resizing browser)
hangout browser resizing will change remote video resolution size (.videoWidth .videoHeight)
Is this associated with webrtc plan b?
I would like to know how it is implemented for many peer connection.
Tell the sending end (say via DataChannels) to change resolution to NxM. At the sending end, until the new APIs are available to change a getUserMedia/MediaStream capture size on the fly, you can request a second camera/mic stream and replace the existing streams with them. (Note: this will cause onnegotiationneeded i.e. renegotiation, and the far side would see a new output stream.)
Smoother (but only in Firefox thus far -- in the standardization process) would be to use RTPSender.replaceTrack() to change the video track without touching audio or renegotiating.
Another option that will exist (though doesn't yet in either browser) is to use RTPSender.width/height (or whatever syntax gets agreed) to scale outgoing video before encoding.
Plan B for multistream/BUNDLE (which Chrome implements) was not adopted; Firefox has now (in Fx38 which goes out in a few days) implemented the Unified Plan; expect to see a blog post soon from someone on how to force the two to work together (until Chrome gets to implementing Unified Plan)

Play audio stream using WebAudio API

I have a client/server audio synthesizer where the server (java) dynamically generates an audio stream (Ogg/Vorbis) to be rendered by the client using an HTML5 audio element. Users can tweak various parameters and the server immediately alters the output accordingly. Unfortunately the audio element buffers (prefetches) very aggressively so changes made by the user won't be heard until minutes later, literally.
Trying to disable preload has no effect, and apparently this setting is only 'advisory' so there's no guarantee that it's behavior would be consistent across browsers.
I've been reading everything that I can find on WebRTC and the evolving WebAudio API and it seems like all of the pieces I need are there but I don't know if it's possible to connect them up the way I'd like to.
I looked at RTCPeerConnection, it does provide low latency but it brings in a lot of baggage that I don't want or need (STUN, ICE, offer/answer, etc) and currently it seems to only support a limited set of codecs, mostly geared towards voice. Also since the server side is in java I think I'd have to do a lot of work to teach it to 'speak' the various protocols and formats involved.
AudioContext.decodeAudioData works great for a static sample, but not for a stream since it doesn't process the incoming data until it's consumed the entire stream.
What I want is the exact functionality of the audio tag (i.e. HTMLAudioElement) without any buffering. If I could somehow create a MediaStream object that uses the server URL for its input then I could create a MediaStreamAudioSourceNode and send that output to context.destination. This is not very different than what AudioContext.decodeAudioData already does, except that method creates a static buffer, not a stream.
I would like to keep the Ogg/Vorbis compression and eventually use other codecs, but one thing that I may try next is to send raw PCM and build audio buffers on the fly, just as if they were being generated programatically by javascript code. But again, I think all of the parts already exist, and if there's any way to leverage that I would be most thrilled to know about it!
Thanks in advance,
Joe
How are you getting on ? Did you resolve this question ? I am solving a similar challenge. On the browser side I'm using web audio API which has nice ways to render streaming input audio data, and nodejs on the server side using web sockets as the middleware to send the browser streaming PCM buffers.