Tensorflow can find right cudnn in one python file but fail in another - tensorflow

I am trying to use tensorflow gpu version to train and test my deep learning model. But here comes the problem. When I train my model in one python file things go on well. Tensorflow-gpu can be used properly. Then I save my model as a pretrained on as grapg.pb format and try to reuse it in another python file.
Then I got the following error messages.
E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN
library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major
and minor version needs to match or have higher minor version in case of
CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN
library. If building from sources, make sure the library loaded at runtime
is compatible with the version specified during compile configuration.
I checked my cudnn version, in fact it is version 7.4.2. I also checked my environment path settings /cuda/v9.0/bin, cuda/v9.0/lib/x64, /cuda/v9.0/include are in there.
So why this happens? And how can I solve this?
--
cuda:v9.0
cudnn:7.4.2 (I think, I copy those cudnn files manually)
windows 10
python: 3.5

If you have multiple CuDNN installed thorough various ways like anaconda module and windows installation, you need to remove the older version in order for your code to detect the latest version and reinstall tensorflow-gpu.
You can follow this guide for installation based on OS.

Related

Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0

I get this error when I run the model.fit_generator code to train images using the CNN model. I don't understand the error, and what should I do? Can anyone help me?
this is the full error description
`Loaded runtime CuDNN library: 8.0.5, but the source was compiled with: 8.1.0. CuDNN library needs to have a matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, ensure the library loaded at runtime is compatible with the version specified during compile configuration.
I had the same error "tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0."
I solved it by downgrading the TensorFlow version, here it says that you use a new version of TensorFlow that is not compatible with the google colab CuDNN version. I used TensorFlow 2.4.0 plus all the dependence required on version 2.4.0.
Here it says which version of TensorFlow to use for cudnn compatibility, https://www.tensorflow.org/install/source
You should always have version of libraries installed that is matching the version dependency you want to use is compiled with.
You can download the version you need from nvidia website or use conda for package management. It will handle all dependencies for you.
You can miniconda and type conda install -c anaconda tensorflow-gpu to get it sorted for you. If you need a specific version of python, you can create environment with it.
My solution:
After confirming that my cuda and cudnn versions are compatible with tensorflow, I first thought that the system did not synchronize after the installation was completed. After several restarts, it was found that it was not and could not be the problem, so I started to check all the cuda in the system. For the software that depends on cudnn, matlab was uninstalled during the period but it was useless. Later, I thought that pytorch is also related to cuda and cudnn. I checked the version of pytorch and found that I was using torch 1.8, and the cuda it was adapted to was 11.1 , The corresponding cudnn is 8.0.5, now the case is solved. Finally upgraded pytorch and solved it.
I have faced the same issue. It seems like if TensorFlow versions requires specific cuDNN version.
Check the link for required versions.
https://www.tensorflow.org/install/source#gpu
Thanks for This answer.
My solution:
After confirming that my cuda and cudnn versions are compatible with
tensorflow, I first thought that the system ...
It helps me a lot,but I use different way to solve this problem.
I found that pytorch 1.8 is compatible with cudnn 8.1.0. So, instead of upgrade pytorch version, I overwrite the cudnn 8.0.5 dll library with cudnn 8.1.0 in directory D:\Program Files\Python37\Lib\site-packages\torch\lib. You can find this location with Everything, which is always helpful.

Does PyInstaller include CUDA

I am working on a Python script (I use Python 3.7.3) that uses tensorflow-gpu (1.14.0) and used PyInstaller 3.5 to convert this script to an executable. I am using CUDA 10.0 and cuDNN 7.6.1 and my graphics card is a NVIDIA GeForce GTX 960M. I recently deinstalled CUDA to test if the executable of the Python script still runs and surprisingly it still runs via GPU, which does not work when I now run the Python script directly.
My question is, can this executable be run on systems without the CUDA toolkit but with a CUDA-capable graphics card?
According to this documentation PyInstaller will make and store a private copy of all of the dependent external libraries which Python code relies on when building a single file executable.
Therefore it is safe to assume that your executable runs irrespective of the installation status of the CUDA toolkit because it has a full private copy of the necessary CUDA libraries internally which it uses when the executable is run.
According to the GitHub issues in the official repository (here and here for example) CUDA libraries are usually dynamically loaded at run-time and not at link-time, so they are typically not included in the final exe file (or folder) with the result that the exe file won't work on a machine without CUDA installed. The solution (please refer to the linked issues too) is to put the DLLs necessary to run the exe in its dist folder (if generated without the --onefile option) or install the CUDA runtime on the target machine.
The behaviour that you're experimenting maybe it's due to the specific version of TF, that loads the libraries in a different fashion with respect to what described above, but it's not the expected behaviour nowadays.

How to see tensorflow build configuration?

I am trying to build tensorflow from source on a remote server (with no superuser privileges) because I got this error when I simply installed with pip:
Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
I completed all the steps listed here successfully, but I still get the same error as above, despite setting CudNN version as 7.1.2 before building.
Is there any way I can see the configurations to verify that they have been set properly?
A file is generated after running ./configure with the name .tf_configure.bazelrc you can inspect that file.

Precompilled Tensorflow - Programmatic way to get the CUDA & CUDNN version it was built against

I am wondering if there is a programmatic way to find out against which CUDA and CUDNN version an installed tensorflow version was built?
For example, a compiled Tensorflow installation can return which CXX11_ABI_FLAG was used while it was built:
python -c "import tensorflow as tf; print(tf.CXX11_ABI_FLAG)" ->0
The background is the following:
I'm building Tensorflow OPs according to adding an op with tensorflow binary installation. This uses pre-compiled TF to retrieve the required include paths and compile flags, to make sure that Tensorflow and new Op are compatible.
But since our systems have multiple CUDA & CUDNN versions, I also need to provide the path of the desired versions to the compiler. E.g. for Cuda 8.0 the following flag -L /usr/local/cuda-8.0/lib64/ to specify its path.
But this also opens up an error path, as the Op can now be built successfully with a different CUDA/CUDNN version, which leads to errors during run-time way later.
So I want to create a safety check, to ensure that the CUDA/CUDNN paths lead to the same versions, as the ones Tensorflow was built against.

Support for Nvidia CUDA Toolkit 9.2

What is the reasoning that Tensorflow-gpu is bound to a specific version of Nvidia's CUDA Toolkit? The current version appears to look for 9.0 specifically and will not work with anything greater. For example I installed the latest Toolkit 9.2 and added it to path but Tensorflow-gpu will not work with it and complains that it is looking for 9.0.
I can see major version updates not being supported but a minor release?
That's a good question. According to NVidia's website,
The CUDA driver is backward compatible, meaning that applications compiled against a particular version of the CUDA will continue to work on subsequent (later) driver releases.
So technically, it should not be a problem to support later iterations of a CUDA driver. And in practice, you will find working non-official pre-built binaries with later versions of CUDA and CuDNN on the net [1], [2]. Even easier to install, the tensorflow-gpu package installed from conda currently comes bundled with CUDA 9.2.
When asked on the topic, a dev answered,
The answer to why is driver issues in the ones required by 9.1, not many new features we need in cuda 9.1, and a few more minor issues.
So the reason looks rather vague -- he might mean that CUDA 9.1 (and 9.2) requires graphics card driver that are perhaps a bit too recent to be really convenient, but that is an uneducated guess.
If NVidia is right about binary compatibility, you may try to simply rename or link your CUDA 9.2 library as a CUDA 9.0 library and it should work. But I would save all my work before attempting this... and the fact that people go as far as recompiling tensorflow to support later CUDA versions may be a hint on how this could end.
When you download TF, you download a pre-built binary file.
In the build process TF is hard linked into a specific version of Cuda, so you cannot use it with different cuda versions.
If you want to work with the new (or sometimes older) version of cuda you will need to install TF from source (check how here)
Or, if you realy don't want to build yourself, check in these repos, there are others that publish specific TF binaries, few examples:
https://github.com/mind/wheels
https://github.com/yaroslavvb/tensorflow-community-wheels
https://github.com/fo40225/tensorflow-windows-wheel
For your convenience I add here the CUDA + cuDNN versions that are required for each prebuilt Tensorflow version:
(I write here just about the TF versions that I worked with, maybe older TF versions use older versions of CUDA as well)
before TF v1.5 cuda 8.0 and cuDNN 6
start from: 1.5 - Prebuilt binaries are now built against CUDA 9 and cuDNN 7.
The issue is not with NVIDIA drivers but Tensorflow itself. I spent an hour trying to make it work, and finally realized that if you download the pre-built binary from googleapi.com, it is hard coded to load libcudart.so.9.0! If you have both cuda 9.0 and 9.2 installed, tensorflow will work (but it's actually loading the dynamic libraries from 9.0). (BTW, I installed TF using anaconda.)
A cleaner approach is to build TF from source. It's not too complicated.