Support for Nvidia CUDA Toolkit 9.2 - tensorflow

What is the reasoning that Tensorflow-gpu is bound to a specific version of Nvidia's CUDA Toolkit? The current version appears to look for 9.0 specifically and will not work with anything greater. For example I installed the latest Toolkit 9.2 and added it to path but Tensorflow-gpu will not work with it and complains that it is looking for 9.0.
I can see major version updates not being supported but a minor release?

That's a good question. According to NVidia's website,
The CUDA driver is backward compatible, meaning that applications compiled against a particular version of the CUDA will continue to work on subsequent (later) driver releases.
So technically, it should not be a problem to support later iterations of a CUDA driver. And in practice, you will find working non-official pre-built binaries with later versions of CUDA and CuDNN on the net [1], [2]. Even easier to install, the tensorflow-gpu package installed from conda currently comes bundled with CUDA 9.2.
When asked on the topic, a dev answered,
The answer to why is driver issues in the ones required by 9.1, not many new features we need in cuda 9.1, and a few more minor issues.
So the reason looks rather vague -- he might mean that CUDA 9.1 (and 9.2) requires graphics card driver that are perhaps a bit too recent to be really convenient, but that is an uneducated guess.
If NVidia is right about binary compatibility, you may try to simply rename or link your CUDA 9.2 library as a CUDA 9.0 library and it should work. But I would save all my work before attempting this... and the fact that people go as far as recompiling tensorflow to support later CUDA versions may be a hint on how this could end.

When you download TF, you download a pre-built binary file.
In the build process TF is hard linked into a specific version of Cuda, so you cannot use it with different cuda versions.
If you want to work with the new (or sometimes older) version of cuda you will need to install TF from source (check how here)
Or, if you realy don't want to build yourself, check in these repos, there are others that publish specific TF binaries, few examples:
https://github.com/mind/wheels
https://github.com/yaroslavvb/tensorflow-community-wheels
https://github.com/fo40225/tensorflow-windows-wheel
For your convenience I add here the CUDA + cuDNN versions that are required for each prebuilt Tensorflow version:
(I write here just about the TF versions that I worked with, maybe older TF versions use older versions of CUDA as well)
before TF v1.5 cuda 8.0 and cuDNN 6
start from: 1.5 - Prebuilt binaries are now built against CUDA 9 and cuDNN 7.

The issue is not with NVIDIA drivers but Tensorflow itself. I spent an hour trying to make it work, and finally realized that if you download the pre-built binary from googleapi.com, it is hard coded to load libcudart.so.9.0! If you have both cuda 9.0 and 9.2 installed, tensorflow will work (but it's actually loading the dynamic libraries from 9.0). (BTW, I installed TF using anaconda.)
A cleaner approach is to build TF from source. It's not too complicated.

Related

How to deal with CUDA version?

How to set up different versions of CUDA in one OS?
Here is my problem: Lastest Tensorflow with GPU support requires CUDA 11.2, whereas Pytorch works with 11.3. So what is the solution to install both libraries in Windows and Ubuntu?
One solution is to use Docker Container Environment, which would only need the Nvidia Driver to be of version XYZ.AB; in this way, you can use both PyTorch and TensorFlow versions.
A very good starting point for your problem would be this one(ML-WORKSPACE) : https://github.com/ml-tooling/ml-workspace

Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0

I get this error when I run the model.fit_generator code to train images using the CNN model. I don't understand the error, and what should I do? Can anyone help me?
this is the full error description
`Loaded runtime CuDNN library: 8.0.5, but the source was compiled with: 8.1.0. CuDNN library needs to have a matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, ensure the library loaded at runtime is compatible with the version specified during compile configuration.
I had the same error "tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0."
I solved it by downgrading the TensorFlow version, here it says that you use a new version of TensorFlow that is not compatible with the google colab CuDNN version. I used TensorFlow 2.4.0 plus all the dependence required on version 2.4.0.
Here it says which version of TensorFlow to use for cudnn compatibility, https://www.tensorflow.org/install/source
You should always have version of libraries installed that is matching the version dependency you want to use is compiled with.
You can download the version you need from nvidia website or use conda for package management. It will handle all dependencies for you.
You can miniconda and type conda install -c anaconda tensorflow-gpu to get it sorted for you. If you need a specific version of python, you can create environment with it.
My solution:
After confirming that my cuda and cudnn versions are compatible with tensorflow, I first thought that the system did not synchronize after the installation was completed. After several restarts, it was found that it was not and could not be the problem, so I started to check all the cuda in the system. For the software that depends on cudnn, matlab was uninstalled during the period but it was useless. Later, I thought that pytorch is also related to cuda and cudnn. I checked the version of pytorch and found that I was using torch 1.8, and the cuda it was adapted to was 11.1 , The corresponding cudnn is 8.0.5, now the case is solved. Finally upgraded pytorch and solved it.
I have faced the same issue. It seems like if TensorFlow versions requires specific cuDNN version.
Check the link for required versions.
https://www.tensorflow.org/install/source#gpu
Thanks for This answer.
My solution:
After confirming that my cuda and cudnn versions are compatible with
tensorflow, I first thought that the system ...
It helps me a lot,but I use different way to solve this problem.
I found that pytorch 1.8 is compatible with cudnn 8.1.0. So, instead of upgrade pytorch version, I overwrite the cudnn 8.0.5 dll library with cudnn 8.1.0 in directory D:\Program Files\Python37\Lib\site-packages\torch\lib. You can find this location with Everything, which is always helpful.

Can CUDA 10.0 and 10.1 be on the same system?

I want support for both Visual Studio 2019 (which needs CUDA 10.1) and TensorFlow-GPU 1.14 (which needs CUDA 10.0) on a Windows PC. Is there any methods?
I simply installed CUDA 10.0 and CUDA 10.1, and add both directory into environment variable CUDA_PATH. cuDNN is already installed.
The result is Visual Studio can detect CUDA but TensorFlow cannot.
Yes, more than one version of the CUDA toolkit can exist on a system and be used by different applications.
How are you installing TensorFlow-GPU? If you're compiling it yourself, during configuration you can specify the path to whichever version of CUDA that you want to use. If you're installing a pre-built set of binaries (e.g. using something like Anaconda) then that's already been built against a specific version of the CUDA toolkit; you'll need to fetch a different version of the binaries compiled for whichever CUDA toolkit you want, or build it yourself.
If you use Anaconda to install TensorFlow-GPU, you should also receive the correct version of the CUDA toolkit that's needed to run whichever version of TensorFlow-GPU that you've installed; it takes care of those dependencies for you.

Tensorflow can find right cudnn in one python file but fail in another

I am trying to use tensorflow gpu version to train and test my deep learning model. But here comes the problem. When I train my model in one python file things go on well. Tensorflow-gpu can be used properly. Then I save my model as a pretrained on as grapg.pb format and try to reuse it in another python file.
Then I got the following error messages.
E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN
library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major
and minor version needs to match or have higher minor version in case of
CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN
library. If building from sources, make sure the library loaded at runtime
is compatible with the version specified during compile configuration.
I checked my cudnn version, in fact it is version 7.4.2. I also checked my environment path settings /cuda/v9.0/bin, cuda/v9.0/lib/x64, /cuda/v9.0/include are in there.
So why this happens? And how can I solve this?
--
cuda:v9.0
cudnn:7.4.2 (I think, I copy those cudnn files manually)
windows 10
python: 3.5
If you have multiple CuDNN installed thorough various ways like anaconda module and windows installation, you need to remove the older version in order for your code to detect the latest version and reinstall tensorflow-gpu.
You can follow this guide for installation based on OS.

tensorflow compiled with cuda 9.1

I'm beginer on tensorflow and i want to install the gpu version with cuda 9.0. The problem is that my gpu (Nvidia mx150) doesn't work with cuda 9.0 (only with cuda 9.1)... So i tried to compile my own version of tensorflow with cuda 9.1. But I'm still blocked cause during the compilation the compiler find an error so it doesn't compile... I don't know why but it's very frustrating. So if you have an already compiled version of tensorflow with cuda 9.1 I'm very intrested !
thanks in advance !
I had the exact same problem a few weeks ago.
The problem is that the current version of TensorFlow (1.7) does not support CUDA9.1, please check this issue comment and the discussion below.
Here are some options that I found:
Compile TensorFlow from the source code by your own
Find an existing whl file. ex. I fixed the issue by using a whl from the repo (same as Y. Luo's answer)
Downgrade to CUDA 9.0
Wait until Tensorflow supports CUDA9.1 :)
If you don't have to use tensorflow 1.7, this repository might have what you want. Just to be clear, I never tried any of them myself.
If you need to install on Windows, this repository might be helpful.