Continuous device and connection issues with routed Tokbox session - webrtc

We’ve been using the Tokbox platform for several months now with a Javascript web-client as well as an Android phone client, where sessions and connections are managed by a Python server. While integration and bring-up went well on both ends (client and server), we continue to encounter problems with the in-session audio and video experience.
Sessions are always routed and always between two participants only, with much use of a collaborative editor.
The in-session experience is like a coin toss: we never know how it’s going to go, and that’s becoming a business threat.
Web-Client: A/V Resources
The most common problem is the acquisition of audio and/or video: at the beginning of a session, one or the other participants may have problems hearing or seeing the other. Allocating a new connection to establish new streams does not fix that, nor does restarting the browser.
Question: What’s the recommended way to detect possible resource locks (e.g. does another application hog the camera/microphone)?
Web-Client: Network
Bandwidth and packet loss are a challenge, for example this inspector graph:
Audio and video of both participants is all over the place, and while we can not control the network connections the web-client should be able to reliably give useful information.
Question: Other than continuous connection monitoring with getStats() and maybe the experimental navigator.connection property, how can the web-client monitor network connectivity?
Pre-Call Test
We recommend to customers to run a pre-call test and have implemented it on our site as well. However, results of that test often times do not reflect the in-session connectivity. Worse, a pre-call test may detect a low (no video) bandwidth while Skype works just fine.
Question: How can that be?

I'm a member of the TokBox development team. I remember you reported an issue with the Python SDK, thanks for that!
Web-Client: A/V Resources
Most acquisition issues are detected by the JS SDK and if they aren't then we'd really like to hear about it! Please report reproduction steps or affected session IDs to TokBox support (referencing this StackOverflow question): https://support.tokbox.com/hc/en-us/requests/new
Most acquisition errors appear as OT_HARDWARE_UNAVAILABLE or OT_MEDIA_ERR_ABORTED errors. Are you detecting and surfacing these errors to your users? There is also the special OT_CHROME_MICROPHONE_ACQUISITION_ERROR error which is due to a known issue with Chrome that has been mostly fixed since Chrome 63 (see https://bugs.chromium.org/p/webrtc/issues/detail?id=4799).
Web-Client: Network
Unfortunately this is one of the more difficult issues to troubleshoot. Yes, Subscriber#getStats() is the best tool we have at our disposal and is a wrapper around the native RTCPeerConnection#getStats() function. Unfortunately we don't have much control over the values returned by the native function and if you think our SDK is returning incorrect values when compared with values from RTCPeerConnection#getStats() then please let us know!
It would be worthwhile confirming whether the issue is reproducible in all browsers or only a particular one. If you have detailed data regarding the inaccuracy of the native RTCPeerConnection#getStats() function then we could work together to report it to the browser vendor(s).
Fortunately we have just released the new Publisher#getStats() function which lets you get the publisher side of the stats. This should help you narrow down the cause of a connectivity issue to either a publisher or subscriber side. Please let us know if this helps with tracking down these issues.
Pre-Call Test
Again, these tests are based on Subscriber#getStats() which in turn are based on RTCPeerConnection#getStats(), the accuracy of which is out of our hands, but we'd love any reproduction steps to either fix a bug in our client SDK or report a bug to the browser vendors.
Just to confirm though, when you say you've implemented a pre-call test in your site, did you use the official JavaScript network test module? https://github.com/opentok/opentok-network-test-js This is actually what's used by the TokBox pre-call test.

#Aiham, thanks for responding, I've been looking at the the new Publisher#getStats() you linked to (thank you!), so we too can give our users some way of visibly seeing the network conditions that might be affected the quality of their call (and who's causing it). However, it seems as though bytes / packets sent goes up sharply as the number of subscribers increases, even though we're in a routed session.
Am I wrong to expect the Publisher#getStats() statistics to stay fairly stable regardless of the number of subscribers then receiving that stream in a routed session? I expected the nature of a routed call to mean it's sent once to the OpenTok Media Servers, and the statistics would end there.

Related

Ice connection state , Completed vs Connected

Can someone please clarify the difference between iceConnectionstate:completed vs iceConnectionstate:connected.
When I connect to browsers with webrtc I am able to exchange data using datachannel but for some reason the the iceConnectionstate on browser that made the offer reaming completed wheres the browser that accepted the offers changes to connected.
Any idea if this is normal?
In short:
connected: Found a working candidate pair, but still performing connectivity checks to find a better one.
completed: Found a working candidate pair and done performing connectivity checks.
For most purposes, you can probably treat the connected/completed states as the same thing.
Note that, as mentioned by Ajay, there are some notable difference between how the standard defines the states and how they're implemented in Chrome. The main ones that come to mind:
There's no "end-of-candidates" signaling, so none of those parts of the candidate state definitions are implemented. This means if a remote candidate arrives late, it's possible to go from "completed" back to "connected" without an ICE restart. Though I assume this is rare in practice.
The ICE state is actually a combination ICE+DTLS state (see: https://bugs.chromium.org/p/webrtc/issues/detail?id=6145). This is because it was implemented before there was such thing as "RTCPeerConnectionState". This can lead to confusion if there's actually a DTLS-level issue, since the only way to really notice is to look in a native Chrome log.
We definitely plan on fixing all the discrepancies. But for a while we held off on it because the standard was still in flux. And right now our priority is more on implementing unified plan SDP and the RtpSender/RtpReceiver APIs.
ICE Connection state transition is a bit tricky, with below flow diagram you can get clear idea on possible transitions.
In simple words:
new/checking: Not at connected
connected/completed: Media path is available
disconnected/failed: Media path is not available (Whatever data you are sending on data channel won't reach other end)
Read full summary here
Still WebRTC team is working hard to make it stable & spec compliant.
Current chrome behavior is confusing so i filed a bug, you can star it to get notified.

Is there any internal timeout in Microsoft UIAutomation?

I am using the UI Automation COM-to-.NET Adapter to read the contents of the target Google Chrome browser that plays a FLASH content on Windows 7. It works.
I succeeded to get the content and elements. Everything works fine for some time but after few hours the elements become inaccessible.
The (AutomationElement).FindAll() returns 0 children.
Is there any internal undocumented Timeout used by UIAutomation ?
According to this IUIAutomation2 interface
There are 2 timeouts but they are not accessible from IUIAutomation interface.
IUIAutomation2 is supported only on Windows 8 (desktop apps only).
So I believe there is some timeout.
I made a workaround that restarts the searching and monitoring of elements from the beginning of the desktop tree but the elements are still not available.
After some time (not sure how much) the elements are available again.
My requirements are to read the values all the time as fast as possible but this behavior makes a damage to the whole architecture.
I read somewhere that there is some timeout of 3 minutes but not sure.
if there is a timeout, is it possible to change it ?
Is it possible to restart something or release/dispose something ?
I can't find anything on MSDN.
Does anybody have any idea what is happening and how to resolve ?
Thanks for this nicely put question. I have a similar issue with a much different setup. I'm on Win7, using UIAutomationCore.dll directly from C# to test our application-under-development. After running my sequence of actions & event subscriptions and all the other things, I intermittently observe that the UIA interface stops working (about 8-10min in my case, but I'm heavily using the UIA interface).
Many different things including dispatching the COM interface, sleeping at different places failed. The funny revelation was I managed to use the AccEvent.exe (part of SDK like inspect.exe) during the test and saw that events also stopped flowing to AccEvent, too. So it wasn't my client's interface that stopped, but it was rather the COM-server (or whatever the UIAutomationCore does) that stopped responding.
As a solution (that seems to work most of the time - or improve the situation a lot), I decided I should give the application-under-test some breathing point, since using the UIA puts additional load on it. This could be a smartly-put sleep points in your client, but instead of sleeping a set time, I'm monitoring the processor load of the application and waiting until it settles down.
One of the intermittent errors I receive when the problem manifests itself is "... was unable to call any of the subscribers..", and my search resulted in an msdn page saying they have improved things on CUIAutomation8 interface, but as this is Windows8 specific, I didn't have the chance to try that yet.
I should also add that I also reduced the number of calls to UIA by incorporating more ui caching (FindAllBuildCache), as the less the frequency of back-and-forth the better it is for the uia. Thanks to the answer of Guy in another question: UI Automation events stop being received after a while monitoring an application and then restart after some time

Why sending and receiving JSON data from the server in IPhone is so slow?

I'm making a mobile client for a web site now. And information exchange between my app and server is in JSON (searching users and data on server,sending messages, conversation threading, etc.) But all these features work too slow. I click on the button "send" and then wait for some second before the message will be sent, the same thing with searching, authorization, etc. So I have such a questions:
1. Why it's such a performance overhead?
2. Can it be troubles with the server side or it's JSON parser troubles or may be something else?
3. How can i fix/optimize this? All solutions, advices etc. will be helpful!
I would use Xcode to debug the app to see whether the majority of time is spent loading the data from the server or parsing the JSON once the data is received.
If it is the first, try loading the data from a PC over the same wireless connection and see if it is slow on that too. If so, clearly your server side code needs optimising.
If it is the second and the parsing is slow, you may want to look into using JSONKit instead of the native JSON parser as testing shows it is faster. You may also want to review the structure of your JSON.
One thing I have noticed however is that connections are slower on my iPad than on other machines. I've noticed this when comparing apps I've developed in the simulator to on the device on the same network and when conducting speedtests. As for why this happens, I am not sure - some form of additional overhead in iOS perhaps.
I can save you some time - it has nothing to do with JSON. It has to do with how the your app handles requests in general. It obviously needs optimization on the server.
EDIT:
I suppose it could also be that you might be experiencing high-latency on your phone, but again, that has nothing to do with your app.
Debug it using a regular browser and chrome dev tools (in the network tab) - you'll see that the requests take long even on a desktop at which point you'll have to start fishing around in the server-side code to see what's making it go slow (hint: unoptimized database queries are a big bottleneck....but then again, so is crappy hardware).
Sorry that I couldn't be of more help, but without seeing the entire setup of the server and the code that's going slow (not the client requests, but the server code), that's the best I can do.
Best of luck.

Does Reach-ability class keep sending / receiving data in iOS dev?

I have been working on Reachability class for a while and have tried both the one from Apple sample and the one from ddg. I wonder whether the Reachability class keep sending / receiving data after starting the notifier.
As I'm developing an app which connect to different hosts quite often, I decided to write a singleton and attach the reachability classes I need on it. The reacability classes would be initiated and start their notifiers once the app start. I use the singleton approach as I want this singleton class to be portable and can be applied to other apps without much rewriting. I am not sure if it is good idea to implement like this but it worked quite well.
However, someone reported that the battery of his device drain significantly faster after using the app and someone reported more data usage. My app does not send / receive data on background so I start wondering if it is related to the reachability.
I tried profiling the energy usage with Instrument and I notice that there are continuous small data (few hundred bytes in average) coming in via the network interfaces even I put my app in idle. However, there are almost no data sending out.
I know that Reachability requires data usage when initiate (resolving DNS etc) but I am not sure that whether it still keep using data after starting notifier. Does anyone can tell?
I am not familiar with the low-level programming, it would be nice if someone could explain how does the Reachability work.
I use Reachability, and while I haven't monitored the connections, I have browsed the code, and I can't see any reason why it would keep sending ( or receiving).
If you have a ethernet connection to your Mac, it is quite easy to check. Enable sharing over wifi of your ethernet connection. Install little snitch, it will run in demo mode for three hours after every boot. Turn off the data connection on the test device and connect it to your mac over wifi.
This will allow you to see any network access your test device is making.
If this isn't possible, you can also run your app in the simulator as the network side should be the same, so you should be able to check.
There are also a ton of other tools to track network activity, but I think little snitch is the easiest to use.

Sample Code for Remote Process/Application Monitoring

All
I am looking for a bit of inspiration here, a client has requested me to build a simple remote process monitoring application with capability for smtp notification, when monitored processes go down or come back on-line. Can anyone point me in the direction of some sample code to get me started. I have briefly looked at .net remoting as a potential development path, from my understanding of how this works, the remoting architecture would require both a client and remote server component, (though I am not a 100% sure on this one), however if this is the case, then it will not fit the bill, as one of the fundamental requirements is that no additional software can be deployed on the servers which are to be monitored.
Alternatively, if you know of a finished 'lightweight' product out there, which would deliver this sort of functionality then this would probably work as well.
Kind Regards
Paul J.
Net-SNMP actually has the ability to monitor processes and send out notifications now. If you look in the snmpd.conf manual page (http://www.net-snmp.org/docs/man/snmpd.conf.html) under "process monitoring" (http://www.net-snmp.org/docs/man/snmpd.conf.html#lbAR) and "disman event-mib" support (http://www.net-snmp.org/docs/man/snmpd.conf.html#lbAX) to get the two features you need. Also the notification sending support (http://www.net-snmp.org/docs/man/snmpd.conf.html#lbAW).
I don't remember the windows support for it and how well it works though (it looks like you're aiming for windows).