Can we use Dask or other Python libraries for distributed computing to connect multiple free instances of Google Colab? - google-colaboratory

Sometimes, we need more GPU power than what Google provides on Colab for computation. Is it feasible to utilize distributed computing libraries like Ray or Dask to overcome this limitation? I understand that using free resources for this purpose might not be ideal, but I would still like to know if it's achievable. I attempted this before, but the address provided by the Dask client on Colab was not publicly accessible by other instances and seemed to be a local address.

Related

Can multiple Colab notebooks share the same Runtime?

In Q1 2019, I ran some experiments and I noticed that Colab notebooks with the same Runtime type (None/GPU/TPU) would always share the same Runtime (i.e., the same VM). For example, I could write a file to disk in one Colab notebook and read it in another Colab notebook, as long as both notebooks had the same Runtime type.
However, I tried again today (October 2019) and it now seems that each Colab notebook gets its own dedicated Runtime.
My questions are:
When did this change happen? Was this change announced anywhere?
Is this always true now? Will Runtimes sometimes be shared and sometimes not?
What is the recommended way to communicate between two Colab notebooks? I'm guessing Google Drive?
Thanks
Distinct notebooks are indeed isolated from one another. Isolation isn't configurable.
For file sharing, I think you're right that Drive is the best bet as described in the docs:
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA
I have found no easy way of running multiple notebooks within the same runtime. That being said, I have no idea how this effects the quota. On my real computer, I'd limit GPU memory per script and run multiple python threads. They don't let you do this, and I think if you do not use the whole amount of RAM, they should not treat that the same as if you had used all of that GPU for 12 or 24 hrs. They can pool your tasks with other users.

Are there any way to do federated learning with real multiple machines using tensorflow-federated API?

I am studying about tensorflow-federated API to make federated learning with real multiple machines.
But I found the answer on this site that not support to make real multiple federated learning using multiple learning.
Are there no way to make federated learning with real multiple machines?
Even I make a network structure for federated learning with 2 clients PC and 1 server PC, Is it impossible to consist of that system using tensorflow federated API?
Or even if I apply the code, can't I make the system I want?
If you can modify the code to configure it, can you give me a tip?If not, when will there be an example to configure on a real computer?
In case you are still looking for something: If you're not bound to TensorFlow, you could have a look at PySyft, which is using PyTorch. Here is a practical example of a FL system built with one server and two Raspberry Pis as clients.
TFF is really about expressing the federated computations you wish to execute. In terms of physical deployments, TFF includes two distinct runtimes: one "reference executor" which simply interprets the syntactic artifact that TFF generates, serially, all in Python and without any fancy constructs or optimizations; another still under development, but demonstrated in the tutorials, which uses asyncio and hierarchies of executors to allow for flexible executor architectures. Both of these are really about simulation and FL research, and not about deploying to devices.
In principle, this may address your question (in particular, see tff.framework.RemoteExecutor). But I assume that you are asking more about deployment to "real" FL systems, e.g. data coming from sources that you don't control. This is really out of scope for TFF. From the FAQ:
Although we designed TFF with deployment to real devices in mind, at this stage we do not currently provide any tools for this purpose. The current release is intended for experimentation uses, such as expressing novel federated algorithms, or trying out federated learning with your own datasets, using the included simulation runtime.
We anticipate that over time the open source ecosystem around TFF will evolve to include runtimes targeting physical deployment platforms.

What does it do if I choose "None" in Hardware Accelerator?

Pretty straightforward question. I was just wondering what was doing the calculus when this option was chosen. Does it run on Goolge's CPU or on my hardware ?
I have looked on Google, Stackoverflow and Colab's Help without success finding a precise answer
Thanks :)
PS : When running a full Dense Network "without" accelarator it is approx. as fast as with TPU and a lot faster than with GPU.
Your guess is correct: None means CPU only, but on a Colab-managed cloud VM rather than your local machine. (Unless you've connected to a local Jupyter instance.
Also keep in mind that you'll need to adjust your code in order to take advantage of hardware accelerators like GPUs and TPUs.
Speedup on a GPU is often a bit magical since many frameworks automatically detect and take advantage of GPUs. Built-in support for TPUs is rare, and obtaining a speedup from TPUs will require adjusting your code.

Google Colab : Local Runtime use

I was currently using google-colab and on the getting started pages, we see:
Local runtime support Colab supports connecting to a Jupyter runtime
on your local machine. For more information, see our documentation.
So, when I saw the documentation I connected my colab notebook to the local runtime, after the installations,etc by using the connected tab.
And when I access the memory info:
!cat /proc/meminfo
The output is as follows:
MemTotal: 3924628 kB
MemFree: 245948 kB
MemAvailable: 1473096 kB
Buffers: 168560 kB
Cached: 1280300 kB
SwapCached: 20736 kB
Active: 2135932 kB
Inactive: 991300 kB
Active(anon): 1397156 kB
Inactive(anon): 560124 kB
Active(file): 738776 kB
Inactive(file): 431176 kB
Unevictable: 528 kB
Mlocked: 528 kB
Which is the memory info for my pc, so certainly the access from the notebook is to my pc? Then how is it any different from my local jupyter-notebook? Now, I can't use the high memory environment of 13 Gigs, nor can I have GPU access.
Would be great if someone can explain!
The main advantages to using Colab with a local backend stem from Drive-based notebook storage: Drive commenting, ACLs, and easy link-based sharing of the finished notebook.
When using Jupyter, sharing notebooks requires sharing files. And, accessing your notebooks from a distinct machine requires installing Jupyter rather than loading a website.
The only benefit is to keep your notebooks in Google Drive.
you can share them easily
you have automatic history/versioning
people can comment on your notebooks
You also have headings with collapsible outline, and probably cleaner UI (if you prefer Colab styling).
TLDR - the short answer is that it's not any different
But, here's an analogy that might help better explain what the point of that is:
Let's pretend Google Colab was something like a video gaming streaming service that enabled users with low-end equipment to play high end graphic demanding video games by hosting the game on their systems. It would make sense if say, we don't have a high end gaming PC or a very strong laptop and we wanted to play a new game that just came out with very high system requirements (which ours barely meets if at all) then naturally we may want to use this streaming service, let's call it Stadia for fun, to play that new game because it lets us play it at 30FPS on 720p resolution for example, whereas maybe using our own computer might give us barely 15 fps at 480p. They would be the people who represent people Like you and I, who want to benefit from the game being run on another system which in this case, would be equivalent to how we want Google Colab to run our iterations on their system. So for us, it wouldn't make sense to have Stadia run locally and use our system resources because there's no benefit in that, even if our saved games were stored locally.
But then there are others, who have high end pc and graphics cards installed, with much better components and resources available to them and let's say they also want to play the same game. Now they could use the same streaming service as us and play at 720p, but since their computer is more powerful and can actually handle let's say the game at 60 FPS on 4k graphics, then they may want to run the game off their own system resources instead of using the streaming service such as Stadia. But that would normally mean, they'd have to get a hardcopy of the game to install it locally on their system and play it that way. And for the sake of the example, let's just pretend it was download only and would require 2 terabytes to install.
So then, if we pretend that stadia had an ability to save them from having download and install the game while still using their systems' resources to provide better graphics while they play, then that would be the case for how or why Colab connecting to a local runtime would serve as a desirable feature for someone. Sharing colab notebooks would be like sharing a game in our theoretical version of stadia, where users wouldn't have to download and install anything so any time there is any update or changes, users would immediately be able to use that new updated version without downloading anything because the actual code (or game install in our metaphor) is run remotely.
Sometimes it's hard to understand things that weren't designed for our use when it contradicts the value which base our decision to use them. Hopefully that helps someone who stumbles across this understand the purpose of it, at least in principle.

Using google compute engine for tensorflow project

Google is offering 300$ for free trail registration for google cloud. I want to use this opportunity to pursue few projects using tensorflow. But unlike AWS, I am not able to find much information on the web regarding how to configure a google compute engine. Can anyone please suggest me or point to resources which will help me?
I already looked into google cloud documentation, while they are clear they really dont give any suggestions as to what kind of CPUs to use or for that matter I cannot see any GPU instances when I tried to create a VM instance. I want to use something on the lines of AWS g2.2xlarge instance.
GPUs on Google Cloud are in alpha:
https://cloud.google.com/gpu/
The timeline given for public availability is 2017:
https://cloudplatform.googleblog.com/2016/11/announcing-GPUs-for-Google-Cloud-Platform.html
I would suggest that you think carefully about whether you want to "scale up" (getting a single very powerful machine to do your training) or "scale out" (distributing your training). In many cases, scaling out works out better and cheaper and Tensorflow/CloudML are set up help you do that.
Here are directions on how to get Tensorflow going in a Jupyter notebook on a Google Compute Engine VM:
https://codelabs.developers.google.com/codelabs/cpb102-cloudml/#0
The first few steps are TensorFlow, the last steps are Cloud ML.