More than one GPU on vast.ai - tensorflow

Anyone with experience using vast.ai for cloud GPU computing knows if when renting more than one GPU do you need to do some setup to take advantage of the extra GPUs?
Because I can't notice any difference on speed when renting 6 or 8 GPUs instead of just one. I'm new at using vast.ai for cloud GPU computing.
I am using this default docker:
Official docker images for deep learning framework TensorFlow (http://www.tensorflow.org)
Successfully loaded tensorflow/tensorflow:nightly-gpu-py3
And just installing keras afterwards:
pip install keras
I have also checked the available GPUs using this and all the GPUs are detected correctly:
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
cheers

Solution:
Finally I found the solution myself. I just used another docker image with an older version of tensorflow (2.0.0), and the error disappeared.

Related

Why has gpu stopped working for me in google colab?

I am a university professor trying to learn deep learning for a possible class in the future. I have been using google colab with GPU support for the past couple of months. Just recently, the GPU device is not found. But, I am doing everything that I have done in the past. I can't imagine that I have done anything wrong because I am just working through tutorials from books and the tensorflow 2.0 tutorials site.
tensorflow 2 on Colab GPU was broken recently due to an upgrade from CUDA 10.0 to CUDA 10.1. As of this afternoon, the issue should be resolved for the tensorflow builds bundled with Colab. That is, if you run the following magic command:
%tensorflow_version 2.x
then import tensorflow will import a working, GPU-compatible tensorflow 2.0 version.
Note, however, if you attempt to install a version of tensorflow using pip install tensorflow-gpu or similar, the result may not work in Colab due to system incompatibilities.
See https://colab.research.google.com/notebooks/tensorflow_version.ipynb for more information.

tf.test.is_gpu_available() returns False on GCP

I am training a CNN on GCP's notebook using a Tesla V100. I've trained a simple yolo on my own custom data and it was pretty fast but not very accurate. So, I decided to write my own code from scratch to solve the specific aspects of the problem that I want to tackle.
I have tried to run my code on Google Colab prior to GCP, and it went well. Tensorflow detects the GPU and is able to use it whether it was a Tesla K80 or T4.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
tf.test.is_gpu_available() #>>> True
My problem is that, this same function returns a False on GCP notebook, as if Tensorflow is unable to use the GPU it detected on GCP VM. I don't know of any command that forces Tensorflow to use the GPU over CPU, since it does that automatically.
I have already tried to install or uninstall and then install some versions of tensorflow, tensorflow-gpu and tf-nightly-gpu (1.13 and 2.0dev for instance) but it yielded nothing.
output of nvidia-smi
Have you tried using GCP's AI Platform Notebooks instead? They offer VMs that are pre-configured with Tensorflow and have all required GPU drivers installed.

Can I run Tensorflow, Keras,and Pytorch for deep learning projects on the latest and highly spec iMac Pro without Nvidia GPUs

I love my iMac and do not mind paying top dollars for it. However I need to run Tensorflow, Keras, and Pytorch for deep learning projects. Can I run them on the latest and maxed-out spec iMac Pro ?
tensorflow 1.8 supports ROCm, IDK how it performs next to nvidia's CUDA
but that means that if you have GPU (radeon) that supports ROCm you can use tensorflow gpu
running tensorflow on gpu is possible but extremely slow and can be added to the definition of torture

Running Tensorboard without CUDA support

Is it possible to run Tensorboard on a machine without CUDA support?
I'm working at a computation center (via ssh) which has two major clusters:
CPU-Cluster which is a general workhorse without CUDA support (no dedicated GPU)
GPU-Cluster with dedicated GPUs e.g. for running neural networks with tensorflow-gpu.
The access to the GPU-cluster is limited to Training etc. such that I can't afford to run Tensorboard on a machine with CUDA-support. Instead, I'd like to run Tensorboard on the CPU-Cluster.
With the TF bundled Tensorboard I get import errors due to missing CUDA support.
It seems reasonable that the official Tensorboard should have a mode for running with CPU-only. Is this true?
I've also found an inofficial standalone Tensorboard version (github.com/dmlc/tensorboard), does this work without CUDA-support?
Solved my problem: just install tensorflow instead of tensorflow-gpu.
Didn't work for me for a while due to my virtual environment (conda), which didn't properly remove tensorflow-gpu.
Tensorboard is not limited by whether a machine has GPU or not.
And as far as I know, what Tensorboard do is parsing events pb files and display them on web. There is not computing, so it doesn't need GPU.

TensorFlow - which Docker image to use?

From TensorFlow Download and Setup under
Docker installation I see:
b.gcr.io/tensorflow/tensorflow latest 4ac133eed955 653.1 MB
b.gcr.io/tensorflow/tensorflow latest-devel 6a90f0a0e005 2.111 GB
b.gcr.io/tensorflow/tensorflow-full latest edc3d721078b 2.284 GB
I know 2. & 3. are with source code and I am using 2. for now.
What is the difference between 2. & 3. ?
Which one is recommended for "normal" use?
TLDR:
First of all - thanks for Docker images! They are the easiest and cleanest way to start with TF.
Few aside things about images
there is no PIL
there is no nano (but there is vi) and apt-get cannot find it. yes i probable can configure repos for it, but why not out of the box
There are four images:
b.gcr.io/tensorflow/tensorflow: TensorFlow CPU binary image.
b.gcr.io/tensorflow/tensorflow:latest-devel: CPU Binary image plus source code.
b.gcr.io/tensorflow/tensorflow:latest-gpu: TensorFlow GPU binary image.
gcr.io/tensorflow/tensorflow:latest-devel-gpu: GPU Binary image plus source code.
And the two properties of concern are:
1. CPU or GPU
2. no source or plus source
CPU or GPU: CPU
For a first time user it is highly recommended to avoid the GPU version as they can be any where from difficult to impossible to use. The reason is that not all machines have an NVidia graphic chip that meet the requirements. You should first get TensorFlow working to understand it then move onto using the GPU version if you want/need.
From TensorFlow Build Instructions
Optional: Install CUDA (GPUs on Linux)
In order to build or run TensorFlow with GPU support, both Cuda
Toolkit 7.0 and CUDNN 6.5 V2 from NVIDIA need to be installed.
TensorFlow GPU support requires having a GPU card with
NVidia Compute Capability >= 3.5. Supported cards include but are not limited to:
NVidia Titan
NVidia Titan X
NVidia K20
NVidia K40
no source or plus source: no source
The docker images will work without needing the source. You should only want or need the source if you need to rebuild TensorFlow for some reason such as adding a new OP.
The standard recommendation for someone new to using TensorFlow is to start with the CPU version without the source.