Unable to configure tensorflow to use GPU acceleration in Ubuntu 16.04 - tensorflow

I am trying to install Tensorflow in Ubuntu 16.04 ( in google cloud ). What I have done so far is created an compute instance. I have added a NVIDIA Tesla K80 to this instance.
Also, made sure that the proper version of tensorflow ( version 1.14.0 ) is installed and
Cuda version of 8.0 is installed
and
CudNN version of 6.0 is installed as per the tensorflow gpu - cuda mapping
When I run a simple tensorflow program, I get
Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
Can anyone please let me know where I am doing wrong. Is the instance selection is correct?
Please do let me know and thanks for your help.

The CUDA and CudNN versions that have been tested with Tensorflow 1.14 are the 10.0 and the 7.4, respectively.
More information about version compatibility can be found here.

Related

Upgrading Cudnn version in Vertex AI Notebook [Kernel Restarting Problem]

Problem: Cudnn version incompatiable with tensorflow and Cuda, Kernel dies and unable to start training in Vertex AI.
Current versions:
import tensorflow as tf
from tensorflow.python.platform import build_info as build
print(f"tensorflow version: {tf.__version__}")
print(f"Cuda Version: {build.build_info['cuda_version']}")
print(f"Cudnn version: {build.build_info['cudnn_version']}")
tensorflow version: 2.10.0
Cuda Version: 11.2
Cudnn version: 8
As per the information (shown in attached screenshot) available here, Cudnn version must be 8.1.
A similar question has been asked here that is related to upgrading Cudnn in Google colab. However, it does not solve my issue. Every other online sources are helpful for Anaconda environment only.
How can I upgrade the Cudnn in my case?
Thank you.
I tried several combinations of tensorflow, Cuda, and Cudnn versions in Google Colab and the following version worked [OS: Ubuntu 20.04]:
tensorflow version: 2.9.2
Cuda Version: 11.2
Cudnn version: 8
Therefore, I downgrated the tensorflow version in Vertex AI from 2.10.0 to 2.9.2 and it worked (solved only the incompatibility issue). I'm still searching the solution for Kernel restarting.
UPDATE::
The problem of Kernel Restatring got fixed after I changed the Kernel from Tensorflow 2 (Local) to Python (Local) in Vertex AI's Notebook as shown in the attached image [Kernel changing option is available on the right-top near the bug symbol].

Anaconda installed CUDA CUdnn and Tensorflow, Why doesn't find the GPU?

I'm using Anaconda prompt to install:
Tensorflow 2.10.0
cudatoolkit 11.3.1
cudnn 8.2.1
I'm using Windows 11 and a RTX 3070 Nvidia graphic card. And all the drives have been updated.
And I tried downloading another version of CUDA and CUdnn in exe.file directly from CUDA website. And
added the directories into system path.The folder looks like this:
But whenever I type in:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
It gives me Num GPUs Available: 0
And surely it eats my CPU for computing everything.
It once succeeded when I used Colab and use GPU as the accelerator then created a session using my GPU. That time the GPU load has been maximized. But later don't know how, I can't use my own GPU for training in Colab or even their default free GPU.
Please help. ChatGPT doesn't give me correct information since it only referred to knowledge before 2020. It keeps asking me to install 'tensorflow-gpu' which has already been removed.

How to deal with CUDA version?

How to set up different versions of CUDA in one OS?
Here is my problem: Lastest Tensorflow with GPU support requires CUDA 11.2, whereas Pytorch works with 11.3. So what is the solution to install both libraries in Windows and Ubuntu?
One solution is to use Docker Container Environment, which would only need the Nvidia Driver to be of version XYZ.AB; in this way, you can use both PyTorch and TensorFlow versions.
A very good starting point for your problem would be this one(ML-WORKSPACE) : https://github.com/ml-tooling/ml-workspace

Using TensorFlow with GPU taking a long time for loading library related to CUDA

Machine Setting:
GPU: GeForce RTX 3060
Driver Version: 460.73.01
CUDA Driver Veresion: 11.2
Tensorflow: tensorflow-gpu 1.14.0
CUDA Runtime Version: 10.0
cudnn: 7.4.1
Note:
CUDA Runtime and cudnn version fits the guide from Tensorflow official documentation.
I've also tried for TensorFlow-gpu = 2.0, still the same problem.
Problem:
I am using Tensorflow for an objection detection task. My situation is that the program will stuck at
2021-06-05 12:16:54.099778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
for several minutes.
And then stuck at next loading process
2021-06-05 12:21:22.212818: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
for even longer time. You may check log.txt for log details.
After waiting for around 30 mins, the program will start to running and WORK WELL.
However, whenever program invoke self.session.run(...), it will load the same two library related to cuda (libcublas and libcudnn) again, which is time-wasted and annoying.
I am confused that where the problem comes from and how to resolve it. Anyone could help?
Discussion Issue on Github
===================================
Update
After #talonmies 's help, the problem was resolved by resetting the environment with correct version matching among GPU, CUDA, cudnn and tensorflow. Now it works smoothly.
Generally, if there are any incompatibility between TF, CUDA and cuDNN version you can observed this behavior.
For GeForce RTX 3060, support starts from CUDA 11.x. Once you upgrade to TF2.4 or TF2.5 your issue will be resolved.
For the benefit of community providing tested built configuration
CUDA Support Matrix

How to run tensorflow-gpu on Nvidia Quadro GV100?

I am currently working as a working student and now I have trouble installing Tensorflow-gpu on a machine using a Nvidia Quadro GV100 GPU.
On the Tensorflow homepage I found out that I need to install CUDA 9.0 and Cudnn 7.x in order to run Tensorflow-gpu 1.9. The problem is that I can't find a suitable CUDA version supporting the GV100. Could it be that there is no CUDA version yet? Is it possible that one can't use the GV100 for tensoflow-gpu?
Sorry for the stupid question, I am new to installing DL frameworks :-)
Thank you very much for your help!
On the Tensorflow homepage I found out that I need to install CUDA 9.0 and Cudnn 7.x in order to run Tensorflow-gpu 1.9.
That is if you want to install a pre-built Tensorflow binary distribution. In that case you need to use the version of CUDA which the Tensorflow binaries were built against, which in this case in CUDA 9.0
The problem is that I can't find a suitable CUDA version supporting the GV100
The CUDA 9.0 and later toolkits fully support Volta cards and that should include the Quadro GV100. The driver which ships with CUDA 9.0 is a 384 series which won't support your GPU. If you are referring to a driver support issue, then the solution would be to install the recommended driver for your GPU, and only install the CUDA toolkit from the CUDA 9.0 bundle, not the toolkit and driver, which is the default.
Otherwise you can use CUDA 9.1 or 9.2, which should have support for your GPU with their supplied drivers, but you will then need to build Tensorflow yourself from source.