Can I install cuda 10.2 for using tensorflow 2.1 or it has to be cuda 10.1? - tensorflow

Can I install cuda 10.2 for using tensorflow 2.1 or it has to be cuda 10.1?
I am using ubuntu 18.04 and I have a NVIDIA Quadro P5000.

Providing the solution here (Answer Section), even though it is present in the Comment Section, for the benefit of the community.
No, as per Tensorflow documentation, TensorFlow supports CUDA 10.1 (TensorFlow >= 2.1.0), please refer compatible version details

Pytorch need CUDA 10.2 but Tensorflow need cuda 10.1. Is it a joke?
No, you can use cuda version 10.2 with tensorflow 2.0.
It is quite simple.
WHY:
When run "import tensorflow", the tensorflow will search a library named 'libcudart.so.$.$' in LD_LIBRARY_PATH. For tensorflow 2.1.0-2.3.0 with cuda 10.1, it's 'libcudart.so.10.1'. With cuda 10.2, we don't have 'libcudart.so.10.1', so there will be a error.
In fact there are not any difference between cuda 10.1 and cuda 10.2, so we can solve this problem through the soft links.
HOW
cd /usr/local/cuda-10.2/targets/x86_64-linux/lib/
ln -s libcudart.so.10.2.89 libcudart.so.10.1
/usr/local/cuda-10.2/extras/CUPTI/lib64
ln -s libcupti.so.10.2.75 libcupti.so.10.1
cd /usr/local/cuda-10.2/lib64
ln -s libcudnn.so.8 libcudnn.so.7
vim /etc/profile
export CUDA_HOME=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
source /etc/profile
Click the button to see the picture.
Done!

Related

Tensorflow 1.15 cannot detect gpu with Cuda10.1

I have installed both tensorflow 2.2.0 and tensorflow 1.15.0(by pip install tensorflow-gpu==1.15.0). The tensorflow 2 is installed in the base environment of Anaconda 3, while the tensorflow 1 is installed in a separate environment.
The tensorflow 2.2.0 can recognize gpu based on a simple test:
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
//output: Default GPU Device: /device:GPU:0
But the tensorflow 1.15.0 can not detect gpu.
For your information, my system environment is python + cuda 10.1 + vs 2015.
The tensosflow versions 1.15.0 to 1.15.3 (the latest version) are all compiled against Cuda 10.0. Downgrading the cuda 10.1 to cuda 10.0 solved the problem.
Also be aware of the python version. It is recommended to install the tensorflow .whl file (as listed at https://nero-mirror.stanford.edu/pypi/simple/tensorflow-gpu/) for the specific python version. As for installation, see How do I install a Python package with a .whl file?
Tensorflow 1.15 expects cuda 10.0 , but I managed to make it work with cuda 10.1 by installing the following packages with Anaconda: cudatoolkit (10.0) and cudnn (7.6.5). So, after running
conda install cudatoolkit=10.0
conda install cudnn=7.6.5
tensorflow 1.15 was able to find and use GPU (which is using cuda 10.1).
PS: I understand your environment is Windows based, but this question pops on Google for the same problem happening on Linux (where I tested this solution). Might be useful also on Windows.
It might have to do with the version compatibility of TF, Cuda and CuDNN. This post has it discussed thoroughly.
Have you tried installing Anaconda? it downloads all the requirements and make it easy for you with just a few clicks.

Advice on tensorflow 1.13 on cuda 10.1

I rushed a bit and upgraded to Ubuntu 18.10, couldn't find a cuda 10.0 version for it so went for cuda 10.1... which I understand tensorflow doesn't support yet.
Would you advise reverting back to Ubuntu 18.04 or patiently waiting for a compatible tensorflow release?
tensorflow 1.13 doesn't work with cuda 10.1 because of the following
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory.
tensorflow is looking for libcublas.so.10.0 whereas cuda provides libcublas.so.10.1.0.105.
The older cuda drivers are available here: https://developer.nvidia.com/cuda-toolkit-archive
TensorFlow provided binaries went from CUDA 9.0 to CUDA 10.0, skipping CUDA 9.1 and 9.2. So I would not recommend waiting for CUDA 10.1 TensorFlow binaries.
It is highly likely that CUDA 10.0 would run fine on Ubuntu 18.10, but if that is not the case, then yes revert back to 18.04.

Is Tensorflow 1.12 compatible with CUDA 10.1?

I've been able to successfully set up an Ubuntu 18.04 server with nvidia-smi 418.39, Driver version 418.39, and CUDA 10.1
I now have a user who wants to run TensorFlow but insists that it is not compatible with CUDA 10.1, only CUDA 10. There is no statement confirming this online anywhere that I can find, nor is it in any release patch notes from TF. Because setting this system up was kind of a pain to do, I'm a little hesitant to try downgrading just one version.
Does anyone have verification whether TensorFlow 1.12 does or does not work with CUDA 10.1?
I can confirm that even tf 1.13.1 only works with CUDA 10.0 for me, not 10.1.
Don't know if symlink will work through.
If you try to run tf 1.13.1 on CUDA 10.1, it will give you "ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory"
TensorFlow 1.12 (and even later versions 1.13.1 and 2.0.0-alpha0) could not be built against CUDA 10.1, thus can be considered incompatible.
I have tried building TensorFlow from source with GPU support. The TensorFlow versions I considered were 1.13.1 and 2.0.0-alpha0. The machine I used runs CentOS 7.6 with GCC 4.8.5. I have the NVIDIA Driver version 418.67 installed (which has the release date 2019.5.7 and supports CUDA Toolkit 10.1).
I succeeded in building both TensorFlow versions with CUDA 10.0 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.0). Note that you don't need to have the GPU attached to the machine (especially if you're using a VM in the cloud) while you're building TensorFlow with GPU support.
However, when I switched to CUDA 10.1 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.1), none of these TensorFlow versions could be built. Besides the changes in location of libcublas, another source of the error is no libcudart.so* are found in cuda-10.1/lib64/ (while they do exist in cuda-10.0/lib64/).
I can also confirm that tf 1.13.1 does not work with CUDA 10.1. While importing tensorflow you will get the following error
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
running ldconfig -v shows the difference
libcublas.so.10.0 vs libcublas.so.10.1.0.105

tensorflow failed to create session: CUDA driver version is insufficient for CUDA runtime version [duplicate]

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow documentation.
TL;DR) See this table: https://www.tensorflow.org/install/source#gpu
Generally:
Check the CUDA version:
cat /usr/local/cuda/version.txt
and cuDNN version:
grep CUDNN_MAJOR -A 2 /usr/local/cuda/include/cudnn.h
and install a combination as given below in the images or here.
The following images and the link provide an overview of the officially supported/tested combinations of CUDA and TensorFlow on Linux, macOS and Windows:
Minor configurations:
Since the given specifications below in some cases might be too broad, here is one specific configuration that works:
tensorflow-gpu==1.12.0
cuda==9.0
cuDNN==7.1.4
The corresponding cudnn can be downloaded here.
Tested build configurations
Please refer to https://www.tensorflow.org/install/source#gpu for a up-to-date compatibility chart (for official TF wheels).
(figures updated May 20, 2020)
Linux GPU
Linux CPU
macOS GPU
macOS CPU
Windows GPU
Windows CPU
Updated as of Dec 5 2020: For the updated information please refer Link for Linux and Link for Windows.
The compatibility table given in the tensorflow site does not contain specific minor versions for cuda and cuDNN. However, if the specific versions are not met, there will be an error when you try to use tensorflow.
For tensorflow-gpu==1.12.0 and cuda==9.0, the compatible cuDNN version is 7.1.4, which can be downloaded from here after registration.
You can check your cuda version using
nvcc --version
cuDNN version using
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
tensorflow-gpu version using
pip freeze | grep tensorflow-gpu
UPDATE:
Since tensorflow 2.0, has been released, I will share the compatible cuda and cuDNN versions for it as well (for Ubuntu 18.04).
tensorflow-gpu = 2.0.0
cuda = 10.0
cuDNN = 7.6.0
if you are coding in jupyter notebook, and want to check which cuda version tf is using, run the follow command directly into jupyter cell:
!conda list cudatoolkit
!conda list cudnn
and to check if the gpu is visible to tf:
tf.test.is_gpu_available(
cuda_only=False, min_cuda_compute_capability=None
)
You can use this configuration for cuda 10.0 (10.1 does not work as of 3/18), this runs for me:
tensorflow>=1.12.0
tensorflow_gpu>=1.4
Install version tensorflow gpu:
pip install tensorflow-gpu==1.4.0
Thanks for the first answer.
Something about backward compatibility.
I can successfully install tensorflow-2.4.0 with cuda-11.1 and cudnn 8.0.5.
Source: https://www.tensorflow.org/install/source#gpu
I had installed CUDA 10.1 and CUDNN 7.6 by mistake. You can use following configurations (This worked for me - as of 9/10). :
Tensorflow-gpu == 1.14.0
CUDA 10.1
CUDNN 7.6
Ubuntu 18.04
But I had to create symlinks for it to work as tensorflow originally works with CUDA 10.
sudo ln -s /opt/cuda/targets/x86_64-linux/lib/libcublas.so /opt/cuda/targets/x86_64-linux/lib/libcublas.so.10.0
sudo cp /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/local/cuda-10.1/lib64/
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10 /usr/local/cuda-10.1/lib64/libcublas.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.10 /usr/local/cuda/lib64/libcusolver.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 /usr/local/cuda/lib64/libcurand.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10 /usr/local/cuda/lib64/libcufft.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so /usr/local/cuda/lib64/libcudart.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.10 /usr/local/cuda/lib64/libcusparse.so.10.0
And add the following to my ~/.bashrc -
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/targets/x86_64-linux/lib/
I had a similar problem after upgrading to TF 2.0. The CUDA version that TF was reporting did not match what Ubuntu 18.04 thought I had installed. It said I was using CUDA 7.5.0, but apt thought I had the right version installed.
What I eventually had to do was grep recursively in /usr/local for CUDNN_MAJOR, and I found that /usr/local/cuda-10.0/targets/x86_64-linux/include/cudnn.h did indeed specify the version as 7.5.0.
/usr/local/cuda-10.1 got it right, and /usr/local/cuda pointed to /usr/local/cuda-10.1, so it was (and remains) a mystery to me why TF was looking at /usr/local/cuda-10.0.
Anyway, I just moved /usr/local/cuda-10.0 to /usr/local/old-cuda-10.0 so TF couldn't find it any more and everything then worked like a charm.
It was all very frustrating, and I still feel like I just did a random hack. But it worked :) and perhaps this will help someone with a similar issue.

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow documentation.
TL;DR) See this table: https://www.tensorflow.org/install/source#gpu
Generally:
Check the CUDA version:
cat /usr/local/cuda/version.txt
and cuDNN version:
grep CUDNN_MAJOR -A 2 /usr/local/cuda/include/cudnn.h
and install a combination as given below in the images or here.
The following images and the link provide an overview of the officially supported/tested combinations of CUDA and TensorFlow on Linux, macOS and Windows:
Minor configurations:
Since the given specifications below in some cases might be too broad, here is one specific configuration that works:
tensorflow-gpu==1.12.0
cuda==9.0
cuDNN==7.1.4
The corresponding cudnn can be downloaded here.
Tested build configurations
Please refer to https://www.tensorflow.org/install/source#gpu for a up-to-date compatibility chart (for official TF wheels).
(figures updated May 20, 2020)
Linux GPU
Linux CPU
macOS GPU
macOS CPU
Windows GPU
Windows CPU
Updated as of Dec 5 2020: For the updated information please refer Link for Linux and Link for Windows.
The compatibility table given in the tensorflow site does not contain specific minor versions for cuda and cuDNN. However, if the specific versions are not met, there will be an error when you try to use tensorflow.
For tensorflow-gpu==1.12.0 and cuda==9.0, the compatible cuDNN version is 7.1.4, which can be downloaded from here after registration.
You can check your cuda version using
nvcc --version
cuDNN version using
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
tensorflow-gpu version using
pip freeze | grep tensorflow-gpu
UPDATE:
Since tensorflow 2.0, has been released, I will share the compatible cuda and cuDNN versions for it as well (for Ubuntu 18.04).
tensorflow-gpu = 2.0.0
cuda = 10.0
cuDNN = 7.6.0
if you are coding in jupyter notebook, and want to check which cuda version tf is using, run the follow command directly into jupyter cell:
!conda list cudatoolkit
!conda list cudnn
and to check if the gpu is visible to tf:
tf.test.is_gpu_available(
cuda_only=False, min_cuda_compute_capability=None
)
You can use this configuration for cuda 10.0 (10.1 does not work as of 3/18), this runs for me:
tensorflow>=1.12.0
tensorflow_gpu>=1.4
Install version tensorflow gpu:
pip install tensorflow-gpu==1.4.0
Thanks for the first answer.
Something about backward compatibility.
I can successfully install tensorflow-2.4.0 with cuda-11.1 and cudnn 8.0.5.
Source: https://www.tensorflow.org/install/source#gpu
I had installed CUDA 10.1 and CUDNN 7.6 by mistake. You can use following configurations (This worked for me - as of 9/10). :
Tensorflow-gpu == 1.14.0
CUDA 10.1
CUDNN 7.6
Ubuntu 18.04
But I had to create symlinks for it to work as tensorflow originally works with CUDA 10.
sudo ln -s /opt/cuda/targets/x86_64-linux/lib/libcublas.so /opt/cuda/targets/x86_64-linux/lib/libcublas.so.10.0
sudo cp /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/local/cuda-10.1/lib64/
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10 /usr/local/cuda-10.1/lib64/libcublas.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.10 /usr/local/cuda/lib64/libcusolver.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 /usr/local/cuda/lib64/libcurand.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10 /usr/local/cuda/lib64/libcufft.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so /usr/local/cuda/lib64/libcudart.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.10 /usr/local/cuda/lib64/libcusparse.so.10.0
And add the following to my ~/.bashrc -
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/targets/x86_64-linux/lib/
I had a similar problem after upgrading to TF 2.0. The CUDA version that TF was reporting did not match what Ubuntu 18.04 thought I had installed. It said I was using CUDA 7.5.0, but apt thought I had the right version installed.
What I eventually had to do was grep recursively in /usr/local for CUDNN_MAJOR, and I found that /usr/local/cuda-10.0/targets/x86_64-linux/include/cudnn.h did indeed specify the version as 7.5.0.
/usr/local/cuda-10.1 got it right, and /usr/local/cuda pointed to /usr/local/cuda-10.1, so it was (and remains) a mystery to me why TF was looking at /usr/local/cuda-10.0.
Anyway, I just moved /usr/local/cuda-10.0 to /usr/local/old-cuda-10.0 so TF couldn't find it any more and everything then worked like a charm.
It was all very frustrating, and I still feel like I just did a random hack. But it worked :) and perhaps this will help someone with a similar issue.