Tensorflow: RC 0.10 3X Slower than 0.9 - tensorflow

I am compiling the current master version from source. If I compile using CUDA 7.5 and CUDNN 4.0 I get the following compilation error:
ERROR: /home/rob/tensorflow/tensorflow/contrib/rnn/BUILD:45:1: undeclared inclusion(s) in rule '//tensorflow/contrib/rnn:python/ops/_lstm_ops_gpu':
this rule is missing dependency declarations for the following files included by 'tensorflow/contrib/rnn/kernels/lstm_ops_gpu.cu.cc':
'/usr/local/cuda-7.5/include/cuda_runtime.h'
'/usr/local/cuda-7.5/include/host_config.h'
'/usr/local/cuda-7.5/include/builtin_types.h'
[etc...]
If I compile with CUDNN 5.1, everything compiles and runs but the execution time is roughly 3X longer for a training script I am currently running compared to the same using the 0.9.0 release installed via pip.
I also tried the pip version of 0.10.rc0 (gpu) and saw the same 3X slow down vs. version 0.9.0
I am using Ubuntu 14.04, py 3.4 and a Tesla K40c gpu. Bazel is version 0.3.1
What is the cause of the 3X slow down of ver 0.10.0rc0 and is there any way to regain the prior performance?
Secondarily, how could I eliminate the build errors when using CUDNN 4?

The relative slowness of 0.10.0rc0 is a confirmed bug that is being addressed. More information and status can be found in this thread.

Related

Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0

I get this error when I run the model.fit_generator code to train images using the CNN model. I don't understand the error, and what should I do? Can anyone help me?
this is the full error description
`Loaded runtime CuDNN library: 8.0.5, but the source was compiled with: 8.1.0. CuDNN library needs to have a matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, ensure the library loaded at runtime is compatible with the version specified during compile configuration.
I had the same error "tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0."
I solved it by downgrading the TensorFlow version, here it says that you use a new version of TensorFlow that is not compatible with the google colab CuDNN version. I used TensorFlow 2.4.0 plus all the dependence required on version 2.4.0.
Here it says which version of TensorFlow to use for cudnn compatibility, https://www.tensorflow.org/install/source
You should always have version of libraries installed that is matching the version dependency you want to use is compiled with.
You can download the version you need from nvidia website or use conda for package management. It will handle all dependencies for you.
You can miniconda and type conda install -c anaconda tensorflow-gpu to get it sorted for you. If you need a specific version of python, you can create environment with it.
My solution:
After confirming that my cuda and cudnn versions are compatible with tensorflow, I first thought that the system did not synchronize after the installation was completed. After several restarts, it was found that it was not and could not be the problem, so I started to check all the cuda in the system. For the software that depends on cudnn, matlab was uninstalled during the period but it was useless. Later, I thought that pytorch is also related to cuda and cudnn. I checked the version of pytorch and found that I was using torch 1.8, and the cuda it was adapted to was 11.1 , The corresponding cudnn is 8.0.5, now the case is solved. Finally upgraded pytorch and solved it.
I have faced the same issue. It seems like if TensorFlow versions requires specific cuDNN version.
Check the link for required versions.
https://www.tensorflow.org/install/source#gpu
Thanks for This answer.
My solution:
After confirming that my cuda and cudnn versions are compatible with
tensorflow, I first thought that the system ...
It helps me a lot,but I use different way to solve this problem.
I found that pytorch 1.8 is compatible with cudnn 8.1.0. So, instead of upgrade pytorch version, I overwrite the cudnn 8.0.5 dll library with cudnn 8.1.0 in directory D:\Program Files\Python37\Lib\site-packages\torch\lib. You can find this location with Everything, which is always helpful.

Using TensorFlow with GPU taking a long time for loading library related to CUDA

Machine Setting:
GPU: GeForce RTX 3060
Driver Version: 460.73.01
CUDA Driver Veresion: 11.2
Tensorflow: tensorflow-gpu 1.14.0
CUDA Runtime Version: 10.0
cudnn: 7.4.1
Note:
CUDA Runtime and cudnn version fits the guide from Tensorflow official documentation.
I've also tried for TensorFlow-gpu = 2.0, still the same problem.
Problem:
I am using Tensorflow for an objection detection task. My situation is that the program will stuck at
2021-06-05 12:16:54.099778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
for several minutes.
And then stuck at next loading process
2021-06-05 12:21:22.212818: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
for even longer time. You may check log.txt for log details.
After waiting for around 30 mins, the program will start to running and WORK WELL.
However, whenever program invoke self.session.run(...), it will load the same two library related to cuda (libcublas and libcudnn) again, which is time-wasted and annoying.
I am confused that where the problem comes from and how to resolve it. Anyone could help?
Discussion Issue on Github
===================================
Update
After #talonmies 's help, the problem was resolved by resetting the environment with correct version matching among GPU, CUDA, cudnn and tensorflow. Now it works smoothly.
Generally, if there are any incompatibility between TF, CUDA and cuDNN version you can observed this behavior.
For GeForce RTX 3060, support starts from CUDA 11.x. Once you upgrade to TF2.4 or TF2.5 your issue will be resolved.
For the benefit of community providing tested built configuration
CUDA Support Matrix

Install Tensorflow 2.x only for CPU using PIP

how do you install only a CPU version of Tensorflow 2.x using pip ?
In the past, it was possible to install this 2 different versions.
Since I am running the scripts in a nonen GPU device ( without envidia card, intel card available without cuda support), I am getting following error:
2020-04-14 23:28:14.632879: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-04-14 23:28:14.632902: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
In the past my workaround was to use a CPU only version.
Thanks for the hints in advance
You can choose the CPU-only version of tensorflow depending on your python version.
Check the list here:
https://www.tensorflow.org/install/pip#package-location
e.g. you will need to do the following for Python 3.8:
pip3 install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow_cpu-2.3.0-cp38-cp38-manylinux2010_x86_64.whl
Issue solved after installing a CPU only version.
I used pin tensorflow-cpu and the version of the release. Somehow the fallback solution for CPU did not work in my setup.

Tensorflow can find right cudnn in one python file but fail in another

I am trying to use tensorflow gpu version to train and test my deep learning model. But here comes the problem. When I train my model in one python file things go on well. Tensorflow-gpu can be used properly. Then I save my model as a pretrained on as grapg.pb format and try to reuse it in another python file.
Then I got the following error messages.
E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN
library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major
and minor version needs to match or have higher minor version in case of
CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN
library. If building from sources, make sure the library loaded at runtime
is compatible with the version specified during compile configuration.
I checked my cudnn version, in fact it is version 7.4.2. I also checked my environment path settings /cuda/v9.0/bin, cuda/v9.0/lib/x64, /cuda/v9.0/include are in there.
So why this happens? And how can I solve this?
--
cuda:v9.0
cudnn:7.4.2 (I think, I copy those cudnn files manually)
windows 10
python: 3.5
If you have multiple CuDNN installed thorough various ways like anaconda module and windows installation, you need to remove the older version in order for your code to detect the latest version and reinstall tensorflow-gpu.
You can follow this guide for installation based on OS.

tensorflow compiled with cuda 9.1

I'm beginer on tensorflow and i want to install the gpu version with cuda 9.0. The problem is that my gpu (Nvidia mx150) doesn't work with cuda 9.0 (only with cuda 9.1)... So i tried to compile my own version of tensorflow with cuda 9.1. But I'm still blocked cause during the compilation the compiler find an error so it doesn't compile... I don't know why but it's very frustrating. So if you have an already compiled version of tensorflow with cuda 9.1 I'm very intrested !
thanks in advance !
I had the exact same problem a few weeks ago.
The problem is that the current version of TensorFlow (1.7) does not support CUDA9.1, please check this issue comment and the discussion below.
Here are some options that I found:
Compile TensorFlow from the source code by your own
Find an existing whl file. ex. I fixed the issue by using a whl from the repo (same as Y. Luo's answer)
Downgrade to CUDA 9.0
Wait until Tensorflow supports CUDA9.1 :)
If you don't have to use tensorflow 1.7, this repository might have what you want. Just to be clear, I never tried any of them myself.
If you need to install on Windows, this repository might be helpful.