is google colab training speed affected by our internet connection? - google-colaboratory

I've looked for some questions and mostly they discussed dataset uploading, but one did state that google colab only use our internet connection to run the code. I am confused with this, does this mean our internet speed also affects the training time of our model? or what matters is that once we ran our code, the google server takes care of it, and do not need our connection?

I think yes, Google Colab's speed is affected by our Internet connection. I'm not absolutely sure why, but you can check in this link. Obviously, when the model is being trained, the Internet data usage rises considerably. My guess is that our computer, as a client, needs to save some hidden internal information relevant to the running state. Therefore, the Colab server has to send this information every time 1 line of code is executed, and the next line of code can only be executed when this information reaches the client. So if the Internet connection is slow, it will take more time for the client to receive the information and the whole process will be slow.
There is also another proof to see if the Internet connection really affects Google Colab's speed. With the coffee Internet, which is significantly stronger than that of my house, the same block of code is executed more than 2 times faster than when using my house's wifi.

Related

Is there a way to keep colab busy even if the tab is closed?

So this may be a stupid question but I can only ask here since I don't know a better place.
So this google colab thing is amazing and wonderfull but there is currently not a way to keep the server itself running without interaction.
Is there a way to do this for a long period of time without any trouble and if there is, is it also possible to physically shut down the tab or your computer to still keep it running? Yes, there is a time limit of about 12 hours that it will give you but I just want to know if there is a way to do this without having your computer on all the time. I'd love to use my phone for it although it's a really old phone that is like from 2012 that doesn't even load half of the sites correctly
Any answers? Thank you so much and have a very nice day!
The runtime session outlives closed tabs. Even if you sign out from your google account and minutes later come back and login again, your notebooks still stays at the same point, since the VM holding the kernel still runs.
Two years ago someone here on stackoverflow said that it would remain for 90 minutes if you close your tab and 12 hours if you don't - Google Colab session timeout.
I don't know if still holds. At least on their FAQ, google does not guarantee anything. In https://research.google.com/colaboratory/faq.html they only say "Virtual machines are deleted when idle for a while, and have a maximum lifetime enforced by the Colab service."

How can I test an internet connection during a certain period of time?

I would like to test the internet connection of some place. Therefore I want to write my own script which is triggered every some random minutes to do some stuff.
I was thinking about:
Getting the latency until I ping some big site.
Downloading a big file from somewhere. (Bandwitdh)
But I think this is not reliable as probably the file that I try to get is cached in some provider' server.
Are there more checks necessary to measure the quality of the internet connection (Availability, Latency and Bandwith)?
How can I avoid having my requests cached?

Data not showing up intermittently on the OpenTSDB UI

We are running some high volume tests by pushing metrics to OpenTSDB (2.3.0) with BigTable, and a curious problem surfaces from time to time. For some metrics, an hour of data stops showing up on the web UI when we run a query. The span of "missing" data is very clearcut and borders on the hour (UTC). After a while, while rerunning the same query, the data shows up. There does not seem to be any pattern that we can deduce here, other than the hour span. Any pointers on what to look for and debug this?
How long do you have to wait before the data shows up? Is it always the most recent hour that is missing?
Have you tried using OpenTSDB CLI when this is happening and issuing a scan to see if the data is available that way?
http://opentsdb.net/docs/build/html/user_guide/cli/scan.html
You could also check via an HBase shell scan to see if you can get the raw data that way (here's information on how it's stored in HBase):
http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
If you can verify the data is there then it seems likely to be a web UI problem. If not, the next likely culprit is something getting backed up in the write pipeline.
I am not aware of any particular issue in the Google Cloud Bigtable backend layer that would cause this behavior, but I believe some folks have encountered issues with OpenTSDB compactions during periods of high load that result in degraded performance.
It's worth checking in the Google Cloud Console to see if there's any outliers in the latency, CPU or throughput graphs that correlate with the times during which you experience the issue.

Microsoft Access Testing

I have inherited an Access database that has linked SQL tables. I need to test the network traffic that is caused by the execution of the Db. I need to ascertain which parts of the system cause the most Network traffic and therefore are the slowest.
I am not an access guru so ive struggled doing what was suggested, which is : have Task Manager open at the Networking tab.
Then Step in into the app and looking at where there is a significant rise in Network traffic. But this seems rather unreliable and time consuming.
Does anyone have any ideas how I can achieve my goal in Access?
If you really need to analyze the network traffic then you should probably get to know WireShark well enough to do a capture that is filtered on the traffic between the client and the SQL server.

best use of NSUrlConnection when getting multiple json objects that depend on the previous

what I am doing is I am querying an API to search for articles in various data bases. There are multiple steps involved, each returns a json object. Each step involves a NSUrlConnection with different query strings to the API
step 1: returns json object indicating status of query & record set ID.
step 2: takes record set id from step 1 and returns list of databases that are valid for querying
step 3: queries each database that was ready from step 2 and gets json data array that has results
I am confused as to the best way of going about this. Is it better to use one nsurlconnection and reopen that connection in connection did finish loading based on which step I am in. Or is it better to open a new connection at the end of each subsequent connection?
A couple of observations
Network latency:
The key phenomenon that we need to be sensitive to here (and it sounds like you are) is network latency. Too often we test our apps in an idea scenario (on simulator with high speed internet access, or on device connected to wifi). But when you use an app in a real-world scenario, network latency can seriously impact performance and you'll want to architect a solution that minimizes this.
Simulating sub-optimal, real-world network situations:
By the way, if you're not doing it already, I'd suggest you install the "Network Link Conditioner" which is part of the "Hardware IO Tools" (available from the "Xcode" menu, choose "Open Developer Tool" - "More Developer Tools"). If you install the "Network Link Conditioner", you can then have your simulator simulate a variety of network experiences (e.g. Good 3G connection, Poor Edge connection, etc.).
Minimize network requests:
Anyway, I'd try to figure out how to minimize separate requests that are dependent upon the previous one. For example, I see step 1 and step 2 and wonder if you could merge those two into a single JSON request. Perhaps that's not possible, but hopefully you get the idea. You want to reduce the number of separate requests that have to happen sequentially.
I'd also look at step 3, and those look like they have to be dependent upon step 2, but perhaps you can run a couple of those step 3 requests concurrently, reducing the latency effect there.
Implementation:
In terms of how this would be implemented, I personally use a concurrent NSOperationQueue with some reasonable maxConcurrentOperationCount setting (e.g. 4 or 5, enough to enjoy concurrency and reduce latency, but not so many as to tax either the device or the server) and submit network operations. In this case, you'll probably submit step 1, with a completion operation that will submit step 2, with a completion operation that will submit a series of step 3 requests and these step 3 requests might run concurrent.
In terms of how to make a good network operation object, I might suggest using something like AFNetworking, which already has a decent network operation object (including one that parses JSON), so maybe you can start there.
In terms of re-using a NSURLConnection, generally its one connection per request. If I have had an app that wanted to have a lengthy exchange of messages with a server (e.g. a chat like service where you want the server to be able to send a message to the client whenever it wants, such as in a chat service), I've done a sockets implementation, but that doesn't seem like the right architecture here.
I would dismiss the first connection and create a new one for each connection.
Just, don't ask me why.
BTW, I would understand the question if this was about reusing vs. creating new objects in some performance sensitive context like scrolling through a table or animations or if it is just about of 10 thousands of iterations where it happens. But you are talking about 3 objects to either create new or reuse the old one. What is the gain of even thinking about it?