TPU custom chip available with Google Cloud ML - tensorflow

Which type of Hardware is used as part of Google Cloud ML when using TensorFlow? Only CPU or Tensor Processing Unit (custom cards) are also available?
cf this article

Cloud ML currently focuses on CPUs. GPUs and TPUs will be available in the future.

Cloud TPUs are available to the public as of 2018-06-27: https://cloud.google.com/tpu/docs/release-notes
This was announced at Google Next '18:
https://www.blog.google/products/google-cloud/empowering-businesses-and-developers-to-do-more-with-ai/

At the time of writing (December 2017), GPUs are available, see https://cloud.google.com/ml-engine/docs/training-overview
If you use the gcloud command line utility, you can e.g. add the --scale-tier BASIC_GPU option when submitting jobs to mlengine. This currently gives runs your tensorflow code on a Tesla K80.
There is also a CUSTOM scale tier which allows more complex configurations and gives access to P100 GPUs (see https://cloud.google.com/ml-engine/docs/using-gpus ).
The TPU service is in 'alpha' status according to https://cloud.google.com/tpu/ and one needs to sign up to learn more.

Related

What are paid alternative services to KAggle?

Kaggle provides 30 hours of GPU usage for free per week. Sometimes, I max out this quota and I need more, or I need a more powerful GPU.
I tried Google Colab Pro subscription, however, it was slower than Kaggle. Are there any cloud computing services for deep learning similar to Kaggle or Google Colab Pro? I prefer to use a fixed type of GPU for a fixed amount of hours for a fixed monthly fee, that is why I do not like AWS because it pay-as-you-use and in general AWS looks too complex for my needs which is learning deep learning by implementing and experimenting different architectures.

How to access Spark DataFrame data in GPU from ML Libraries such as PyTorch or Tensorflow

Currently I am studying the usage of Apache Spark 3.0 with Rapids GPU Acceleration. In the official spark-rapids docs I came across this page which states:
There are cases where you may want to get access to the raw data on the GPU, preferably without copying it. One use case for this is exporting the data to an ML framework after doing feature extraction.
To me this sounds as if one could make data that is already available on the GPU from some upstream Spark ETL process directly available to a framework such as Tensorflow or PyTorch. If this is the case how can I access the data from within any of these frameworks? If I am misunderstanding something here, what is the quote exactly referring to?
The link you references really only allows you to get access to the data still sitting on the GPU, but using that data in another framework, like Tensorflow or PyTorch is not that simple.
TL;DR; Unless you have a library explicitly setup to work with the RAPIDS accelerator you probably want to run your ETL with RAPIDS, then save it, and launch a new job to train your models using that data.
There are still a number of issues that you would need to solve. We have worked on these in the case of XGBoost, but it has not been something that we have tried to tackle for Tensorflow or PyTorch yet.
The big issues are
Getting the data to the correct process. Even if the data is on the GPU, because of security, it is tied to a given user process. PyTorch and Tensorflow generally run as python processes and not in the same JVM that Spark is running in. This means that the data has to be sent to the other process. There are several ways to do this, but it is non-trivial to try and do it as a zero-copy operation.
The format of the data is not what Tensorflow or PyTorch want. The data for RAPIDs is in an arrow compatible format. Tensorflow and PyTorch have APIs for importing data in standard formats from the CPU, but it might take a bit of work to get the data into a format that the frameworks want and to find an API to let you pull it in directly from the GPU.
Sharing GPU resources. Spark only recently added in support for scheduling GPUs. Prior to that people would just launch a single spark task per executor and a single python process so that the python process would own the entire GPU when doing training or inference. With the RAPIDS accelerator the GPU is not free any more and you need a way to share the resources. RMM provides some of this if both libraries are updated to use it and they are in the same process, but in the case of Pytorch and and Tensoflow they are typically in python processes so figuring out how to share the GPU is hard.

Equivalents of Google Colab with more RAM

Im currently working on a project creating simulations from defined models. Most of the testing on down-sized models has taken place in google colab to make use of the GPU accelerator option. However, when up-scaling to the full sized models I now exceed the maximum RAM for google colab. Is there an equivalent service which allows for 25GB RAM rather than 12GB. Access to GPU acceleration is still essential
Note: I am outside the US so Google Colab Pro is not an option

How to use TPU in TensorFlow custom training loop?

I don't use Keras. And I want to use TPUs on Google Colab. Questions:
Can tf.session automatically use TPUs?
What do tf.contrib.tpu.TPUDistributionStrategy, tf.contrib.tpu.rewrite, tf.contrib.cluster_resolver.TPUClusterResolver do in TPU computing? Are they all necessary?
Which Tensorflow version are you running? Currently the firmware for TPUs on Google Colab only support 1.14 (I may be wrong with the exact version, but it's definitely 1.x), however if you are using TF2.0, there is TPU support for nightly-2.x on GCP, so perhaps you can give that a try!
Note that in 2.0, you would want to get rid of any "sessions" because that is no longer a thing. Check out the TPUStrategy docs here for more information: https://www.tensorflow.org/guide/distributed_training#tpustrategy

Does Google Cloud ML only support distributed Tensorflow for Multiple GPU training jobs?

I'd like to run a Tensorflow application using multiple GPU's on Cloud ML.
My Tensorflow application is written in the non-distributed paradigm, that is outlined here
From what I understand if I want to use Cloud ML to run this same application, with multiple GPU's then the application must use scale tier CUSTOM and I need to set up parameter servers, worker servers which seem to be a distributed-tensorflow paradigm. Link here
Is this the only way to run multiple GPU training jobs on Cloud ML?
Is there a guide that helps me scope the changes required for my multiGPU (tower based) training application to a distributed tensorflow application?
You can use CUSTOM tier with only a single master node, and no workers/parameter servers. Those are optional parameters.
Then complex_model_m_gpu has 4 GPUs, and complex_model_l_gpu has 8.