Using TensorFlow with GPU taking a long time for loading library related to CUDA - tensorflow

Machine Setting:
GPU: GeForce RTX 3060
Driver Version: 460.73.01
CUDA Driver Veresion: 11.2
Tensorflow: tensorflow-gpu 1.14.0
CUDA Runtime Version: 10.0
cudnn: 7.4.1
Note:
CUDA Runtime and cudnn version fits the guide from Tensorflow official documentation.
I've also tried for TensorFlow-gpu = 2.0, still the same problem.
Problem:
I am using Tensorflow for an objection detection task. My situation is that the program will stuck at
2021-06-05 12:16:54.099778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
for several minutes.
And then stuck at next loading process
2021-06-05 12:21:22.212818: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
for even longer time. You may check log.txt for log details.
After waiting for around 30 mins, the program will start to running and WORK WELL.
However, whenever program invoke self.session.run(...), it will load the same two library related to cuda (libcublas and libcudnn) again, which is time-wasted and annoying.
I am confused that where the problem comes from and how to resolve it. Anyone could help?
Discussion Issue on Github
===================================
Update
After #talonmies 's help, the problem was resolved by resetting the environment with correct version matching among GPU, CUDA, cudnn and tensorflow. Now it works smoothly.

Generally, if there are any incompatibility between TF, CUDA and cuDNN version you can observed this behavior.
For GeForce RTX 3060, support starts from CUDA 11.x. Once you upgrade to TF2.4 or TF2.5 your issue will be resolved.
For the benefit of community providing tested built configuration
CUDA Support Matrix

Related

Upgrading Cudnn version in Vertex AI Notebook [Kernel Restarting Problem]

Problem: Cudnn version incompatiable with tensorflow and Cuda, Kernel dies and unable to start training in Vertex AI.
Current versions:
import tensorflow as tf
from tensorflow.python.platform import build_info as build
print(f"tensorflow version: {tf.__version__}")
print(f"Cuda Version: {build.build_info['cuda_version']}")
print(f"Cudnn version: {build.build_info['cudnn_version']}")
tensorflow version: 2.10.0
Cuda Version: 11.2
Cudnn version: 8
As per the information (shown in attached screenshot) available here, Cudnn version must be 8.1.
A similar question has been asked here that is related to upgrading Cudnn in Google colab. However, it does not solve my issue. Every other online sources are helpful for Anaconda environment only.
How can I upgrade the Cudnn in my case?
Thank you.
I tried several combinations of tensorflow, Cuda, and Cudnn versions in Google Colab and the following version worked [OS: Ubuntu 20.04]:
tensorflow version: 2.9.2
Cuda Version: 11.2
Cudnn version: 8
Therefore, I downgrated the tensorflow version in Vertex AI from 2.10.0 to 2.9.2 and it worked (solved only the incompatibility issue). I'm still searching the solution for Kernel restarting.
UPDATE::
The problem of Kernel Restatring got fixed after I changed the Kernel from Tensorflow 2 (Local) to Python (Local) in Vertex AI's Notebook as shown in the attached image [Kernel changing option is available on the right-top near the bug symbol].

Anaconda installed CUDA CUdnn and Tensorflow, Why doesn't find the GPU?

I'm using Anaconda prompt to install:
Tensorflow 2.10.0
cudatoolkit 11.3.1
cudnn 8.2.1
I'm using Windows 11 and a RTX 3070 Nvidia graphic card. And all the drives have been updated.
And I tried downloading another version of CUDA and CUdnn in exe.file directly from CUDA website. And
added the directories into system path.The folder looks like this:
But whenever I type in:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
It gives me Num GPUs Available: 0
And surely it eats my CPU for computing everything.
It once succeeded when I used Colab and use GPU as the accelerator then created a session using my GPU. That time the GPU load has been maximized. But later don't know how, I can't use my own GPU for training in Colab or even their default free GPU.
Please help. ChatGPT doesn't give me correct information since it only referred to knowledge before 2020. It keeps asking me to install 'tensorflow-gpu' which has already been removed.

I am having Error importing tensorflow as tf?

while importing tensorflow
Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-08-28 00:21:19.206030: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
system
Hp 245 g5 notebook
operating system ubuntu 18.4
How to solve the problem?
It seems you are trying to use the TensorFlow-GPU version and you have downloaded conflicting software versions for it.
Note: GPU support is available for Ubuntu and Windows with CUDA enabled cards only.
If you have a Cuda enabled card follow the instructions provided below.
As stated in Tensorflow documentation. The software requirements are as follows.
Nvidia gpu drivers - 418.x or higher
Cuda - 10.1 (TensorFlow >= 2.1.0)
cuDNN - 7.6
Make sure you have these exact versions of the software mentioned above. See this
Also, check the system requirements here.
For downloading the software mentioned above see here.
For downloading TensorFlow follow the instructions provided here to correctly install the necessary packages.

Unable to configure tensorflow to use GPU acceleration in Ubuntu 16.04

I am trying to install Tensorflow in Ubuntu 16.04 ( in google cloud ). What I have done so far is created an compute instance. I have added a NVIDIA Tesla K80 to this instance.
Also, made sure that the proper version of tensorflow ( version 1.14.0 ) is installed and
Cuda version of 8.0 is installed
and
CudNN version of 6.0 is installed as per the tensorflow gpu - cuda mapping
When I run a simple tensorflow program, I get
Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
Can anyone please let me know where I am doing wrong. Is the instance selection is correct?
Please do let me know and thanks for your help.
The CUDA and CudNN versions that have been tested with Tensorflow 1.14 are the 10.0 and the 7.4, respectively.
More information about version compatibility can be found here.

How to run tensorflow-gpu on Nvidia Quadro GV100?

I am currently working as a working student and now I have trouble installing Tensorflow-gpu on a machine using a Nvidia Quadro GV100 GPU.
On the Tensorflow homepage I found out that I need to install CUDA 9.0 and Cudnn 7.x in order to run Tensorflow-gpu 1.9. The problem is that I can't find a suitable CUDA version supporting the GV100. Could it be that there is no CUDA version yet? Is it possible that one can't use the GV100 for tensoflow-gpu?
Sorry for the stupid question, I am new to installing DL frameworks :-)
Thank you very much for your help!
On the Tensorflow homepage I found out that I need to install CUDA 9.0 and Cudnn 7.x in order to run Tensorflow-gpu 1.9.
That is if you want to install a pre-built Tensorflow binary distribution. In that case you need to use the version of CUDA which the Tensorflow binaries were built against, which in this case in CUDA 9.0
The problem is that I can't find a suitable CUDA version supporting the GV100
The CUDA 9.0 and later toolkits fully support Volta cards and that should include the Quadro GV100. The driver which ships with CUDA 9.0 is a 384 series which won't support your GPU. If you are referring to a driver support issue, then the solution would be to install the recommended driver for your GPU, and only install the CUDA toolkit from the CUDA 9.0 bundle, not the toolkit and driver, which is the default.
Otherwise you can use CUDA 9.1 or 9.2, which should have support for your GPU with their supplied drivers, but you will then need to build Tensorflow yourself from source.