I'm working remotely and from time to time I need to use the GPU for model training. I connect to the company network using ssh. Is there a way to see if someone is currently using the GPU for training?
Does Google Cloud ML predictions run on multiple devices, or a single device?
I find my Google ML preductions running at ~7sec but when running my model locally with a Flask server on a 4-core machine it takes ~1.8-2.1 sec.
Is there a way to increase the number of devices/resources I am using on Google Cloud ML?
Yes you can use more resources to serve your predictions. However the feature is still at alpha stage and it will only be available to a selected list of accounts who opted in as "Trusted Testers". Please contact cloudml-feedback#google.com if you need help to setup prediction service using multicores.
Kubernetes supports GPUs as an experimental feature. Does it work in google container engine? Do I need to have some special configuration to enable it? I want to be able to run machine learning workloads, but want to use Python 3 which isn't available in CloudML.
GPUs on Google Container Engine are now available in Alpha. Sign up form.
Beware that alpha cluster limitations apply: they cannot be upgraded, and they will be auto-deleted in 30 days.
Disclaimer: I work at GCP.
I am afraid this is not supported out of the box. When creating a regular instance in Google Compute Engine (GCE) you are able to select GPU specs for your machine. On the other side, when creating a cluster, these options are not available. I imagine that this will be available sooner or later, but not at the moment.
As an alternative, you can create several GCE instances and build a cluster using tools like kubeadm or following guides like Kubernetes the hard way: https://github.com/kelseyhightower/kubernetes-the-hard-way
I've not tested it, but as long as GPU vm are just machine types I would say that doing these two steps should make it feasible:
UPDATE: Main site for Custom Machine Types: https://cloud.google.com/custom-machine-types/
1- Create a GPU Custom Machine Type: https://cloud.google.com/compute/docs/gpus/
You can add GPUs to any non-shared-core predefined machine type or custom machine type that you are able to create in a zone
2- When creating nodes, chose your custom machine type in your cluster or node pool: https://cloud.google.com/container-engine/docs/clusters/operations
--machine-type: The Google Compute Engine machine type (e.g. n1-standard-1) to use for instances in this container cluster. If unspecified, the default machine type is n1-standard-1
I've so far seen people using tensorflow in Azure using in this link.
Also using the advantage of ubuntu in windows tensorflow can be run on
windows pc as well.Here is the link.
However during a conversation with Windows Azure engineer Hai Ning it came out
that "Azure ML PaaS VMs use Windows OS; TensorFlow is not supported on Windows as of now."
Hence there is no direct way of running tensorflow in Azure ML.
Is there any work around anyone figured out that allows running tensorflow in Azure ML.
Quick update for you. As of TensorFlow r0.12 there is now a native TensorFlow package for Windows. I have it running successfully on my Windows 10 laptop. See this blog post from Google for more information.
While working on Udacity Deep Learning assignments, I encountered memory problem. I need to switch to a cloud platform. I worked with AWS EC2 before but now I would like to try Google Cloud Platform (GCP). I will need at least 8GB memory. I know how to use docker locally but never tried it on the cloud.
Is there any ready-made solution for running Tensorflow on GCP?
If not, which service (Compute Engine or Container Engine) would make it easier to get started?
Any other tip is also appreciated!
Summing up the answers:
AI Platform Notebooks - One click Jupyter Lab environment
Deep Learning VMs images - Raw VMs with ML libraries pre-installed
Deep Learning Container Images - Containerized versions of the DLVM images
Cloud ML
Manual installation on Compute Engine. See instructions below.
Instructions to manually run TensorFlow on Compute Engine:
Create a project
Open the Cloud Shell (a button at the top)
List machine types: gcloud compute machine-types list. You can change the machine type I used in the next command.
Create an instance:
gcloud compute instances create tf \
--image container-vm \
--zone europe-west1-c \
--machine-type n1-standard-2
Run sudo docker run -d -p 8888:8888 --name tf b.gcr.io/tensorflow-udacity/assignments:0.5.0 (change the image name to the desired one)
Find your instance in the dashboard and edit default network.
Add a firewall rule to allow your IP as well as protocol and port tcp:8888.
Find the External IP of the instance from the dashboard. Open IP:8888 on your browser. Done!
When you are finished, delete the created cluster to avoid charges.
This is how I did it and it worked. I am sure there is an easier way to do it.
More Resources
You might be interested to learn more about:
Google Cloud Shell
Container-Optimized Google Compute Engine Images
Google Cloud SDK for a more responsive shell and more.
Good to know
"The contents of your Cloud Shell home directory persist across projects between all Cloud Shell sessions, even after the virtual machine terminates and is restarted"
To list all available image versions: gcloud compute images list --project google-containers
Thanks to #user728291, #MattW, #CJCullen, and #zain-rizvi
Google Cloud Machine Learning is open to the world in Beta form today. It provides TensorFlow as a Service so you don't have to manage machines and other raw resources. As part of the Beta release, Datalab has been updated to provide commands and utilities for machine learning. Check it out at: http://cloud.google.com/ml.
Google has a Cloud ML platform in a limited Alpha.
Here is a blog post and a tutorial about running TensorFlow on Kubernetes/Google Container Engine.
If those aren't what you want, the TensorFlow tutorials should all be able to run on either AWS EC2 or Google Compute Engine.
You now can also use pre-configured DeepLearning images. They have everything that is required for the TensorFlow.
This is an old question but there's are new, even easier options now:
If you want to run TensorFlow with Jupyter Lab
GCP AI Platform Notebooks, which gives you on-click access to a Jupyter Lab Notebook with Tensorflow pre-installed (you can also use Pytorch, R, or a few other libraries instead if you prefer).
If you just want to use a raw VM
If you don't care about Jupyer Lab and just want a raw VM with Tensorflow pre-installed, you can instead create a VM using GCP's Deep Learning VM Image. These DLVM images give you a VM with Tensorflow pre-installed and are all setup to use GPUs if you want. (The AI Platform Notebooks use these DLVM images under the hood)
If you'd like to run it on both your laptop and the cloud
Finally, if you want to be able to run tensorflow both on your personal laptop and in the cloud and are comfortable using Docker, you can use GCP's Deep Learning Container Images. It contains the exact same setup as the DLVM images, but packaged as a container instead, so you can launch these anywhere you like.
Extra benefit: If you're running this container image on your laptop, it's 100% free :D
Im not sure there if there is a need for you to stay on the Google Cloud platform. If you are able to use other products you might save a lot of time, and some money.
If you are using TensorFLow I would recommend a platform called TensorPort. It is exclusively for TesnorFlow and is the easy platform I am aware of. Code and data are loaded with git and they provide a python module for automatic toggling of paths between remote and your local machine. They also provide some boiler plate code for setting up distributed computing if you need it. Hope this helps.