Running Multiple Gpus in theano jupyter notebooks, implementing theano.gpyarray.use - gpu

I have a linux system with three gpus. I am using keras with theano to run cnn's, In the past when I was using Theano 8.+ , I was able to assign a particular gpu to jupyter notebook window using the following:
import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu2")
This allowed me to run three versions of the same cnn model using different hyper-parameters.
I very recently updated both keras (to 2.0) and theano ( to 0.9). This required me to setup the gpuarray backend.
Running just one jupyter notebook with a model works fine. gpu1 is selected by theano. However when I startup a second notebook with the same model, theano tries to use the gpu assigned to the first notebook, causing a memory usage problem and ultimately causing the cnn model to run on the cpu rather than using one of the available two remaining gpus.
Is there a way to select the gpu that I wish the run on each jupyter notebook in theano 0.9 as I was able in theano 8.+

Related

TensorFlow Keras Sequential API GPU usage

When using TensorFlow's Keras sequential API is there any way to force my model to be trained on a certain piece of hardware? My understanding is that if there is a GPU to use (and I have tensorflow-gpu installed) I will, by default, do my training on the GPU.
Do I have to switch to a different API to gain more control over where my model is deployed?
I am a keras user and I work on ubuntu. I specify a certain GPU as follows:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
where 0 is the number of GPU. By default, tensorflow uses the first GPU (whose number is 0) if there are several ones on your computer. You can obtain the information of GPUs by typing the following command on your terminal:
nvidia-smi
or
watch -n 1 -d nvidia-smi
if you want to refresh your terminal every second. The following picture shows the information of my GPU, and the number of it has been circled by a red box.

TF Keras NAN Loss when using multiple GPUs

System:
Ubuntu 18.04 LTS
(2) NVIDIA GTX 1080Ti GPUs 11GB
Driver Version: 440.33.01
CUDA Version: 10.0
I am currently using Tensorflow 2.0 (Python) and the tf.keras library to train a CNN.
However, I am encountering an issue when I try to train my model by calling model.fit(). After
I begin training, the loss is normal for 1 ~ 2 steps for the first epoch. But after that, it suddenly becomes NaN loss. If I try to stop the kernel that is running the training script, the whole computer freezes.
This issue only happens when using multiple GPUs. The code I'm using works perfectly fine on a single GPU. I have wrapped all of my code inside the scope of a tf.distribute.MirroredStrategy using with strategy.scope():. I am feeding my network with data from a tf.data.Dataset (though this error occurs regardless of the data I'm using to train).
I then ran some tests:
1) I tried to replace the data in my dataset with random numbers from a distribution, but the loss stil went to NaN.
2) I also tried feeding the numpy arrays directly to .fit(), but that didn't solve the issue.
3) I tried using different optimizers (Adam, RMSprop, SGD), batch sizes (4, 8, 16, 32), and learning rates, none of which helped to solve this problem.
4) I swapped out my network for a simple Multi-layer Perceptron, but the error persisted.
This doesn't appear to be an OOM issue, since the data is relatively small and running watch -n0.1 nvidia-smi reveals that memory usage never exceeds 30% on either of my GPUs. There doesn't seem to be any warning or error in the console output that might hint at the issue either.
Any help is appreciated

Google Colab GPU speed-up works with 2.x, but not with 1.x

In https://colab.research.google.com/notebooks/gpu.ipynb, which I assume is an official demonstration of GPU speed-up by Google, if I follow the steps, the GPU speed-up (around 60 times faster than with CPU) using Tensorflow 2.x works. However, if I want to use version 1.15 like in https://colab.research.google.com/drive/12dduH7y0GPztxSM0AFlfpjj8FU5x8YSv (the only change compared to the notebook from the first link is getting rid of "%tensorflow_version 2.x" both times), tf.test.gpu_device_name() returns the string /device:GPU:0 but there is no speed-up. I would really love to use the a Tensorflow version between 1.5 and 1.15 though, as the code I want to run uses functions removed in Tensorflow 2.x. Does anyone know how to use Tensorflow 1.x while still getting the GPU speed-up?
In your notebook your code is not executed actually, since you didn't called session.run() nor tf.enable_eager_execution().
Add tf.enable_eager_execution() at the top of your code and you'll see the real difference between cpu and gpu times.

How to specify which GPU to use when running tensorflow?

We have a DGX-1 in Lab.
I see many tasks are running on different GPU.
For MLperf docker application, I can use NV_GPU=x to assign which GPU to use.
However, I have a python Keras/TensorFlow code, I used this same way, the loading doesn't go to the specified GPU.
You could use CUDA_VISIBLE_DEVICES to specify the GPU to be used by your model:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = 0,1 #Will assign GPUs 0 and 1 to the model

Does tensorflow automatically detect GPU or do I have to specify it manually?

I have a code written in tensorflow that I run on CPUs and it runs fine.
I am transferring to a new machine which has GPUs and I run the code on the new machine but the training speed did not improve as expected (takes almost the same time).
I understood that Tensorflow automatically detects GPUs and run the operations on them (https://www.quora.com/How-do-I-automatically-put-all-my-computation-in-a-GPU-in-TensorFlow) & (https://www.tensorflow.org/tutorials/using_gpu).
Do I have to change the code to make it manually runs the operations on GPUs (for now I have a single GPU)? and what would be gained by doing that manually?
Thanks
If the GPU version of TensorFlow is installed and if you don't assign all your tensors to CPU, some of them should be assigned to GPU.
To find out which devices (CPU, GPU) are available to TensorFlow, you can use this:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Regarding the question of the performance, it's quite a broad subject and it really depends of your model, your data and so on. Here are a few and wide remarks on TensorFlow performance.