Using google compute engine for tensorflow project - tensorflow

Google is offering 300$ for free trail registration for google cloud. I want to use this opportunity to pursue few projects using tensorflow. But unlike AWS, I am not able to find much information on the web regarding how to configure a google compute engine. Can anyone please suggest me or point to resources which will help me?
I already looked into google cloud documentation, while they are clear they really dont give any suggestions as to what kind of CPUs to use or for that matter I cannot see any GPU instances when I tried to create a VM instance. I want to use something on the lines of AWS g2.2xlarge instance.

GPUs on Google Cloud are in alpha:
https://cloud.google.com/gpu/
The timeline given for public availability is 2017:
https://cloudplatform.googleblog.com/2016/11/announcing-GPUs-for-Google-Cloud-Platform.html
I would suggest that you think carefully about whether you want to "scale up" (getting a single very powerful machine to do your training) or "scale out" (distributing your training). In many cases, scaling out works out better and cheaper and Tensorflow/CloudML are set up help you do that.
Here are directions on how to get Tensorflow going in a Jupyter notebook on a Google Compute Engine VM:
https://codelabs.developers.google.com/codelabs/cpb102-cloudml/#0
The first few steps are TensorFlow, the last steps are Cloud ML.

Related

Can we use Dask or other Python libraries for distributed computing to connect multiple free instances of Google Colab?

Sometimes, we need more GPU power than what Google provides on Colab for computation. Is it feasible to utilize distributed computing libraries like Ray or Dask to overcome this limitation? I understand that using free resources for this purpose might not be ideal, but I would still like to know if it's achievable. I attempted this before, but the address provided by the Dask client on Colab was not publicly accessible by other instances and seemed to be a local address.

Could I retrieve the accessibility to Google Colaboratory after the usage limitation?

According to Google's FAQ webpage, Google often hampers heavy users from using GPU and TPU at Google Colaboratory. I think that I ran into trouble due to this reason. However, the document does not state whether users who Google regulated can reuse this service. If you know about the issue, please share your experience. Thank you.
For reference, I post my experience here. Though I could not use Google Colab at that time, the constraint was released after one day.

Can multiple Colab notebooks share the same Runtime?

In Q1 2019, I ran some experiments and I noticed that Colab notebooks with the same Runtime type (None/GPU/TPU) would always share the same Runtime (i.e., the same VM). For example, I could write a file to disk in one Colab notebook and read it in another Colab notebook, as long as both notebooks had the same Runtime type.
However, I tried again today (October 2019) and it now seems that each Colab notebook gets its own dedicated Runtime.
My questions are:
When did this change happen? Was this change announced anywhere?
Is this always true now? Will Runtimes sometimes be shared and sometimes not?
What is the recommended way to communicate between two Colab notebooks? I'm guessing Google Drive?
Thanks
Distinct notebooks are indeed isolated from one another. Isolation isn't configurable.
For file sharing, I think you're right that Drive is the best bet as described in the docs:
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA
I have found no easy way of running multiple notebooks within the same runtime. That being said, I have no idea how this effects the quota. On my real computer, I'd limit GPU memory per script and run multiple python threads. They don't let you do this, and I think if you do not use the whole amount of RAM, they should not treat that the same as if you had used all of that GPU for 12 or 24 hrs. They can pool your tasks with other users.

Are there any way to do federated learning with real multiple machines using tensorflow-federated API?

I am studying about tensorflow-federated API to make federated learning with real multiple machines.
But I found the answer on this site that not support to make real multiple federated learning using multiple learning.
Are there no way to make federated learning with real multiple machines?
Even I make a network structure for federated learning with 2 clients PC and 1 server PC, Is it impossible to consist of that system using tensorflow federated API?
Or even if I apply the code, can't I make the system I want?
If you can modify the code to configure it, can you give me a tip?If not, when will there be an example to configure on a real computer?
In case you are still looking for something: If you're not bound to TensorFlow, you could have a look at PySyft, which is using PyTorch. Here is a practical example of a FL system built with one server and two Raspberry Pis as clients.
TFF is really about expressing the federated computations you wish to execute. In terms of physical deployments, TFF includes two distinct runtimes: one "reference executor" which simply interprets the syntactic artifact that TFF generates, serially, all in Python and without any fancy constructs or optimizations; another still under development, but demonstrated in the tutorials, which uses asyncio and hierarchies of executors to allow for flexible executor architectures. Both of these are really about simulation and FL research, and not about deploying to devices.
In principle, this may address your question (in particular, see tff.framework.RemoteExecutor). But I assume that you are asking more about deployment to "real" FL systems, e.g. data coming from sources that you don't control. This is really out of scope for TFF. From the FAQ:
Although we designed TFF with deployment to real devices in mind, at this stage we do not currently provide any tools for this purpose. The current release is intended for experimentation uses, such as expressing novel federated algorithms, or trying out federated learning with your own datasets, using the included simulation runtime.
We anticipate that over time the open source ecosystem around TFF will evolve to include runtimes targeting physical deployment platforms.

Can I set workloads in mxnet when using a distributed environment (multi nodes)?

I want to ask whether I can set different workloads when I use distributed computing environment using mxnet. I read some tutorial for distributed GPUs.
But I want to use distributed nodes (CPUs) environment and I want to set different workload to them. Can I do that? If yes, then can I get some examples about it?
Thank you for your answer!
Yes it is supported. Check this link which shows that you can specify work_load_list according to GPUs or CPUs you want to distribute your work load.
http://mxnet.io/how_to/multi_devices.html#advanced-usage
Also, you should check python API reference (http://mxnet.io/api/python/model.html#mxnet.model.FeedForward). work_load_list parameter can be set while doing model.Feedforward.fit(__)
Hope this helps!