sending tensorflow job to remote GPU - tensorflow

I am a systems researcher (student) and I'm new to tensorflow.
My setup is as follows -
I have 2 GPUs say X and Y. Both are CUDA capable. I have a Ubuntu VM on GPU-X (not cuda capable, no GPU passthrough)
The VM, X and Y are all on the same LAN and reachable.
Problem - I know that I can install tensorflow on my VM and it will know that cudatoolkit and NVidia driver are absent and run its job using the CPU but what if I wanted to use GPU X or Y to run the tensorflow job ? is there a provision in tensorflow to make this happen ?

Related

Blas xGEMMBatched launch failed on 3080ti x 2 GPUs Tensorflow 1.14, CUDA 10.0, CudNN 7.6.5

I am working on the extension of the following method, which involves retraining network.
https://github.com/brownvc/matryodshka
The PC I am using has the following specifications:
Ubuntu 20.04 LTS
NVIDIA 3080ti (x2) GPUs
64 GB RAM
I have created the conda environment with the provided yml file, and separately installed cudatoolkit==10.0.130 and cudnn=7.6.5.
Then training with the provided code, returns this error when the GPUs are utilized.
Blas xGEMMBatched launch failed
On the other hand, the training runs without error if I disable GPU support.
I tried available solutions to resolve this problem but to no benefit, such as:
I set config.gpu_options.allow_growth to True.
I don't think it is an OOM error, as there is no such text in the log file.

GCP GPU is not detected in Keras

I'm running the UNet Keras model on a GCP instance with one NVIDIA Tesla P4GPU. But it does not detect the GPU. Instead it runs on the CPU. p.s. I installed drivers & tensorflow-gpu buy it wont work. How to fix this issue?
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (instance-1): /proc/driver/nvidia/version does not exist
Num GPUs Available: 0
You need to first install the driver. Follow this instruction

Tensorflow's Multi_Gpu_Model crashes Ubuntu 18.04 OS when using all 4 GPUs

I am setting up Multi-GPU-model using Tensorflow on Ubuntu 18.04 LTS desktop. I run the code on 4 NVIDIA RTX 2080 TI and compile the model using the CPU. The same code functions on Windows 10 OS, in case of the Ubuntu, it crashes and the system goes to reboot. Where do I check or change? is it is OS/code?
with tf.device("/cpu:0"):
model = create_image_model()
# make the model parallel
model = multi_gpu_model(model, gpus=G)
Try adding tf.ConfigProto(allow_soft_placement=True) to your session or estimator config. In the unlikely event this doesn't help, try switching IOMMU off in UEFI.
https://www.tensorflow.org/guide/using_gpu

Nvidia Titan X (Pascal) Tensorflow Windows 10

My Operating System is Windows 10 and I am using Keras with Tensorflow backend on CPU. I want to buy the "Nvidia Titan x (Pascal)" GPU as it is recommended for tensorflow on Nvidia website:
http://www.nvidia.com/object/gpu-accelerated-applications-tensorflow-configurations.html
They recommend Ubuntu 14.04 as the OS.
Does anybody know if I can use Tensorflow on Nvidia Titan x (Pascal) GPU, on my Windows 10 machine?
Thanks a lot.

theano - use external GPU only for ML and integrated GPU for display

I have a CPU with integrated GPU. I also have an external GPU that i have been using for ML. What i want is to use the integrated GPU only for display and dedicate the external GPU to NN training (in order to free some memory).
I have set at the BIOS the external GPU to be the primary GPU, but also to both be active. So they are both working. After i boot the system i can plug the monitor to any one of them and they both work.
The problem is that when i plug the monitor to the motherboard (integrated GPU) theano stops using the external GPU:
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
Is there a way to explicitly point theano to the external GPU? here is my the relevant part of my .theanorc:
[global]
floatX = float32
device = gpu
I have a similar system to yours. For linux, installing bumblebee worked.
sudo apt-get install bumblebee-nvidia
(adapt to your distro's package manager)
Then launch python via:
optirun python