GCP GPU is not detected in Keras - tensorflow

I'm running the UNet Keras model on a GCP instance with one NVIDIA Tesla P4GPU. But it does not detect the GPU. Instead it runs on the CPU. p.s. I installed drivers & tensorflow-gpu buy it wont work. How to fix this issue?
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (instance-1): /proc/driver/nvidia/version does not exist
Num GPUs Available: 0

You need to first install the driver. Follow this instruction

Related

Using the RTX 3070 laptop GPU for CNN model training with a windows system

I'm trying to use my laptop RTX 3070 GPU for CNN model training because I have to employ a exhastive grid search to tune the hyper parameters. I tried many different methods however, I could not get it done. Can anyone kindly point me in the right direction?
I followed the following procedure.
The procedure:
Installed the NVIDIA CUDA Toolkit 11.2
Installed NVIDIA cuDNN 8.1 by downloading and pasting the files (bin,include,lib) into the NVIDIA GPU Computing Toolkit/CUDA/V11.2
Setup the environment variable by including the path in the system path for both bin and libnvvm.
Installed tensorflow 2.11 and python 3.8 in a new conda environment.
However, I was unable to setup the system to use the GPU that is available. The code seems to be only using the CPU and when I query the following request I get the below output.
query:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Output:
TensorFlow version: 2.11.0
Num GPUs Available: 0
Am I missing something here or anyone has the same issue like me?
You should use DirectML plugin. From tensorflow 2.11 Gpu support has been dropped for native windows. you need to use DirectML plugin.
You can follow the tutorial here to install

Anaconda installed CUDA CUdnn and Tensorflow, Why doesn't find the GPU?

I'm using Anaconda prompt to install:
Tensorflow 2.10.0
cudatoolkit 11.3.1
cudnn 8.2.1
I'm using Windows 11 and a RTX 3070 Nvidia graphic card. And all the drives have been updated.
And I tried downloading another version of CUDA and CUdnn in exe.file directly from CUDA website. And
added the directories into system path.The folder looks like this:
But whenever I type in:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
It gives me Num GPUs Available: 0
And surely it eats my CPU for computing everything.
It once succeeded when I used Colab and use GPU as the accelerator then created a session using my GPU. That time the GPU load has been maximized. But later don't know how, I can't use my own GPU for training in Colab or even their default free GPU.
Please help. ChatGPT doesn't give me correct information since it only referred to knowledge before 2020. It keeps asking me to install 'tensorflow-gpu' which has already been removed.

How to access NVIDIA Quadro K4000 from my remote desktop Windows 10?

I have been trying to access my NVIDIA Quadro K4000 GPU from my remote desktop Windows 10. I need to use it for TensorFlow object detection version 2.9 or greater. For TensorFlow 2.9 or higher I have installed CUDA and cuDNN 11.2 and visual studio 2019 according to the build configuration. It runs perfectly on my local PC and shows the local GPU on my laptop after running this code:
import tensorflow as tf
import os
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
But this line of code doesnot show any GPU device when I connect to my remote desktop with NVIDIA Quadro K4000 GPU.
This line of code returns null value:
tf.config.list_physical_devices('GPU')
I have tried everything from editing path variable to editing with 'gpedit.msc' from Run command. I cannot use my GPU remotely. I am stuck for long time.
Please help me.
Tried editing all these. but in vain
I solved the issue. I was unaware about the compute capability of my GPU and the version of CUDA and cuDNN I was using on the system. It took me two days for this but now I have access to my Tensorflow-GPU version 2.0.0.

Blas xGEMMBatched launch failed on 3080ti x 2 GPUs Tensorflow 1.14, CUDA 10.0, CudNN 7.6.5

I am working on the extension of the following method, which involves retraining network.
https://github.com/brownvc/matryodshka
The PC I am using has the following specifications:
Ubuntu 20.04 LTS
NVIDIA 3080ti (x2) GPUs
64 GB RAM
I have created the conda environment with the provided yml file, and separately installed cudatoolkit==10.0.130 and cudnn=7.6.5.
Then training with the provided code, returns this error when the GPUs are utilized.
Blas xGEMMBatched launch failed
On the other hand, the training runs without error if I disable GPU support.
I tried available solutions to resolve this problem but to no benefit, such as:
I set config.gpu_options.allow_growth to True.
I don't think it is an OOM error, as there is no such text in the log file.

tf.test.is_gpu_available() returns False on GCP

I am training a CNN on GCP's notebook using a Tesla V100. I've trained a simple yolo on my own custom data and it was pretty fast but not very accurate. So, I decided to write my own code from scratch to solve the specific aspects of the problem that I want to tackle.
I have tried to run my code on Google Colab prior to GCP, and it went well. Tensorflow detects the GPU and is able to use it whether it was a Tesla K80 or T4.
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
tf.test.is_gpu_available() #>>> True
My problem is that, this same function returns a False on GCP notebook, as if Tensorflow is unable to use the GPU it detected on GCP VM. I don't know of any command that forces Tensorflow to use the GPU over CPU, since it does that automatically.
I have already tried to install or uninstall and then install some versions of tensorflow, tensorflow-gpu and tf-nightly-gpu (1.13 and 2.0dev for instance) but it yielded nothing.
output of nvidia-smi
Have you tried using GCP's AI Platform Notebooks instead? They offer VMs that are pre-configured with Tensorflow and have all required GPU drivers installed.