Tensorflow-gpu with conda: where is CUDA_HOME specified? - tensorflow

Back in the days, installing tensorflow-gpu required to install separately CUDA and cuDNN and add the path to LD_LIBRARY_PATH and CUDA_HOME to the environment.
Now, a simple conda install tensorflow-gpu==1.9 takes care of everything. Removing the CUDA_HOME and LD_LIBRARY_PATH from the environment has no effect whatsoever on tensorflow-gpu.
Question: where is the path to CUDA specified for TensorFlow when installing it with anaconda?

When you install tensorflow-gpu, it installs two other conda packages:
cudatoolkit: 9.0-h13b8566_0
cudnn: 7.1.2-cuda9.0_0
And if you look carefully at the tensorflow dynamic shared object, it uses RPATH to pick up these libraries on Linux:
(tflow) $ ldd $CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so | grep -E 'cufft|curand'
libcufft.so.9.0 => /tmp/wani.1533146686/tflow/lib/python3.6/site-packages/tensorflow/python/../../../../libcufft.so.9.0 (0x00007fbb9454a000)
libcurand.so.9.0 => /tmp/wani.1533146686/tflow/lib/python3.6/site-packages/tensorflow/python/../../../../libcurand.so.9.0 (0x00007fbb905e4000)
(tflow) $ ldd $CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so | grep cudnn
libcudnn.so.7 => /tmp/wani.1533146686/tflow/lib/python3.6/site-packages/tensorflow/python/../../../../libcudnn.so.7 (0x00007fd73b55d000)
The only thing is required from you is libcuda.so.1 which is usually available in standard list of search directories for libraries, once you install the cuda drivers.

In my case, the following command took care of it automatically:
$ sudo apt install nvidia-cuda-toolkit

Related

Tensorflow GPU Not detected : Centos

I used to multi-gpu system in tensorflow.
however, from someday, the following code used CPU only.
tf.debugging.set_log_device_placement(True)
strategy = tf.distribute.MirroredStrategy()
Moreover, the return of physical device check function is empty
tf.config.list_physical_devices('GPU')
The return of nvidia-smi correctly show as following picture
Environment
NVIDIA_SMI: 418.87.00
Driver ver: 418.87.00
CUDA ver: 10.1
Tensorflow: 2.4.1
CuDNN:
How do I handle this issue?
Tensorflow 2.4 is compatible with cudnn v8.0 and cuda 11.
So, upgrade cuddn and cuda.
If you are not using Anaconda, update the system paths and ensure they aren't any previous version.
e.g.,
/usr/local/cuda/bin/nvcc --version
Conda install:
# conda update --force conda ## if needed
# conda update conda ## if needed
conda activate <env>
conda install cudatoolkit
conda install -c anaconda cudnn
conda list cuda
conda list cudnn
Here is a script for manual install / you'll probably need even if using conda:
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
sudo apt-get update
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
# Install TensorRT. Requires that libcudnn8 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
libnvinfer-dev=7.1.3-1+cuda11.0 \
libnvinfer-plugin7=7.1.3-1+cuda11.0
Have you changed anything in eco system.
I would suggest you to install cuda 11 and cudnn 8.0 with tensorflow 2.4.0 and above.
Then give it a try. Hope your problem will be resolved.

How to install latest cuDNN to conda?

In conda the latest version of conda is:
cudnn 7.3.1 cuda10.0_0 anaconda
But i need 7.4.2 for tensorflow-gpu.1.13
How install cuDNN==7.4.2 in conda?
conda update --force conda
conda update conda
conda install -c anaconda cudnn
conda list cudnn
You can install with conda-forge
conda install -c conda-forge cudnn
https://anaconda.org/conda-forge/cudnn
It is more up to date than anaconda channel - for example as of today, latest version of cudnn on anaconda is still 7.6.5, but on conda-forge v8.2.0.53.
Same applies to cudatoolkit package.
You need to uninstall cudnn: conda uninstall cudnn.
Uninstall any tensorflow dependencies: "conda uninstall tensorflow"
Install tensorflow using pip: "pip install tensorflow"
Install CuDNN and Cuda ToolKit following the instructions in here: https://www.tensorflow.org/install/gpu#linux_setup
Use PyCharm or Spyder to run Scripts using tensorflow
The best use is to install both cuda-toolkit and CuDNN using conda environment for the best compatibility. But in some cases people might need the latest version. Moreover sometimes cuda packages are updated in different schedules such as the time being this answer is provided, conda provides cudatoolkit-11.0 but cant provide CuDNN-8.0 at the same time. which happened in my case. There is a workaround for this problem.
install conda-toolkit using conda enviroment and download the latest matching CuDNN version from Nvidia CuDNN page for installed cuda-toolkit. Use tar and unzip the packages and copy the CuDNN files to your anaconda environment.
sudo cp cuda/include/cudnn*.h /anaconda3/envs/<your environment here>/include
sudo cp cuda/lib64/libcudnn* /anaconda3/envs/<your environment here>/lib
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /anaconda3/envs/<your environment here>/lib/libcudnn*
In the given snipped "cuda" path represent the unzipped CuDNN folder. This workaround is tested with tensorflow-2.4 & cudatoolkit-11.0 & CuDNN 8.0.4
This is how i installed cudnn.
1. You can download cudnn tar file of a version which you want from NVIDIA and extract it.
Then, you can see "cuda" folder including cudnn files.
2. Copy and paste the cudnn files to conda envs lib and include folder:
sudo cp cuda/include/cudnn*.h anaconda3/envs/"your_env_name"/include
sudo cp cuda/lib64/libcudnn* anaconda3/envs/"your_env_name"/lib
anaconda3 is your anaconda installation folder.
In my case, it worked.
This was not possible to do it with conda at the time the question was made. That is way it was suggested to try this. However, now it is possible. Follow the other answers

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow documentation.
TL;DR) See this table: https://www.tensorflow.org/install/source#gpu
Generally:
Check the CUDA version:
cat /usr/local/cuda/version.txt
and cuDNN version:
grep CUDNN_MAJOR -A 2 /usr/local/cuda/include/cudnn.h
and install a combination as given below in the images or here.
The following images and the link provide an overview of the officially supported/tested combinations of CUDA and TensorFlow on Linux, macOS and Windows:
Minor configurations:
Since the given specifications below in some cases might be too broad, here is one specific configuration that works:
tensorflow-gpu==1.12.0
cuda==9.0
cuDNN==7.1.4
The corresponding cudnn can be downloaded here.
Tested build configurations
Please refer to https://www.tensorflow.org/install/source#gpu for a up-to-date compatibility chart (for official TF wheels).
(figures updated May 20, 2020)
Linux GPU
Linux CPU
macOS GPU
macOS CPU
Windows GPU
Windows CPU
Updated as of Dec 5 2020: For the updated information please refer Link for Linux and Link for Windows.
The compatibility table given in the tensorflow site does not contain specific minor versions for cuda and cuDNN. However, if the specific versions are not met, there will be an error when you try to use tensorflow.
For tensorflow-gpu==1.12.0 and cuda==9.0, the compatible cuDNN version is 7.1.4, which can be downloaded from here after registration.
You can check your cuda version using
nvcc --version
cuDNN version using
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2
tensorflow-gpu version using
pip freeze | grep tensorflow-gpu
UPDATE:
Since tensorflow 2.0, has been released, I will share the compatible cuda and cuDNN versions for it as well (for Ubuntu 18.04).
tensorflow-gpu = 2.0.0
cuda = 10.0
cuDNN = 7.6.0
if you are coding in jupyter notebook, and want to check which cuda version tf is using, run the follow command directly into jupyter cell:
!conda list cudatoolkit
!conda list cudnn
and to check if the gpu is visible to tf:
tf.test.is_gpu_available(
cuda_only=False, min_cuda_compute_capability=None
)
You can use this configuration for cuda 10.0 (10.1 does not work as of 3/18), this runs for me:
tensorflow>=1.12.0
tensorflow_gpu>=1.4
Install version tensorflow gpu:
pip install tensorflow-gpu==1.4.0
Thanks for the first answer.
Something about backward compatibility.
I can successfully install tensorflow-2.4.0 with cuda-11.1 and cudnn 8.0.5.
Source: https://www.tensorflow.org/install/source#gpu
I had installed CUDA 10.1 and CUDNN 7.6 by mistake. You can use following configurations (This worked for me - as of 9/10). :
Tensorflow-gpu == 1.14.0
CUDA 10.1
CUDNN 7.6
Ubuntu 18.04
But I had to create symlinks for it to work as tensorflow originally works with CUDA 10.
sudo ln -s /opt/cuda/targets/x86_64-linux/lib/libcublas.so /opt/cuda/targets/x86_64-linux/lib/libcublas.so.10.0
sudo cp /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/local/cuda-10.1/lib64/
sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10 /usr/local/cuda-10.1/lib64/libcublas.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.10 /usr/local/cuda/lib64/libcusolver.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10 /usr/local/cuda/lib64/libcurand.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10 /usr/local/cuda/lib64/libcufft.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so /usr/local/cuda/lib64/libcudart.so.10.0
sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.10 /usr/local/cuda/lib64/libcusparse.so.10.0
And add the following to my ~/.bashrc -
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda/targets/x86_64-linux/lib/
I had a similar problem after upgrading to TF 2.0. The CUDA version that TF was reporting did not match what Ubuntu 18.04 thought I had installed. It said I was using CUDA 7.5.0, but apt thought I had the right version installed.
What I eventually had to do was grep recursively in /usr/local for CUDNN_MAJOR, and I found that /usr/local/cuda-10.0/targets/x86_64-linux/include/cudnn.h did indeed specify the version as 7.5.0.
/usr/local/cuda-10.1 got it right, and /usr/local/cuda pointed to /usr/local/cuda-10.1, so it was (and remains) a mystery to me why TF was looking at /usr/local/cuda-10.0.
Anyway, I just moved /usr/local/cuda-10.0 to /usr/local/old-cuda-10.0 so TF couldn't find it any more and everything then worked like a charm.
It was all very frustrating, and I still feel like I just did a random hack. But it worked :) and perhaps this will help someone with a similar issue.

cudnn path shows 7.0.5.15 but the system only recognizes cudnn 7102

I am trying to run a bit of python code that uses tensorflow-gpu. However, when the process tries to run, I'm getting the following error:
2018-04-13 20:03:49.215876: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-04-13 20:03:49.220783: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
However, I typed env and it listed CUDNN_VERSION=7.0.5.15 and LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
I installed cudnn 7.0.5 by downloading and copying the relevant files into /user/local/cuda/
Why is this error occurring? I am using a kubernetes backed cluster
I fixed it using this post from askubuntu
Pasting the instructions from that post here:
Step 0: Install cuda from the standard repositories. (See How can I install CUDA on Ubuntu 16.04?)
Step 1: Register an nvidia developer account and download cudnn here (about 80 MB)
Step 2: Check where your cuda installation is. For the installation from the repository it is /usr/lib/... and /usr/include. Otherwise, it will be /usr/local/cuda/ or /usr/local/cuda-<version>. You can check it with which nvcc or ldconfig -p | grep cuda
Step 3: Copy the files:
$ cd folder/extracted/contents
$ sudo cp -P include/cudnn.h /usr/include
$ sudo cp -P lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
$ sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*
Basically, on the cudnn installation instruction, it only tells you to copy your cudnn.h and libcudnn* files to the cuda folder. However, in addition to that, one also needs to copy those files inside the systems main include and lib64 folders. That will fix this problem.

cuda install error on Ubuntu 17.04

abigail#abilina:~/Downloads$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
Selecting previously unselected package cuda-repo-ubuntu1604.
(Reading database ... 205999 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1604_8.0.61-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1604 (8.0.61-1) ...
Setting up cuda-repo-ubuntu1604 (8.0.61-1) ...
Warning: The postinst maintainerscript of the package cuda-repo-ubuntu1604
Warning: seems to use apt-key (provided by apt) without depending on gnupg or gnupg2.
Warning: This will BREAK in the future and should be fixed by the package maintainer(s).
Note: Check first if apt-key functionality is needed at all - it probably isn't!
Warning: apt-key should not be used in scripts (called from postinst maintainerscript of the package cuda-repo-ubuntu1604)
OK
abigail#abilina:~/Downloads$ sudo apt-get install cuda
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda : Depends: cuda-8-0 (>= 8.0.61) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
My ubuntu version is 17.04. Does this mean my Linux currently can't install CUDA? I want to install TensorFlow with GPU support.
Per suggestion:
abigail#abilina:~/Downloads$ sudo apt-get -f install
Reading package lists... Done
Building dependency tree
Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
https://launchpad.net/ubuntu/zesty/amd64/nvidia-cuda-toolkit
above is the correct one.
apt-get install nvidia-cuda-toolkit
but remember the cuda installed through apt were installed in different location. manually create ln -s at /usr/local/cuda for include,lib64,and bin
I have successfully installed CUDA 8.0 + the latest patch from NVIDIA on Ubuntu 17.04:
Download the .run file from https://developer.nvidia.com/cuda-downloads, choosing Ubuntu 16.04 (Base Installer)
You will not be able to install it by just running, because it is looking for a file called InstallUtils.pm which is not present in Ubuntu 17.04, but curiously, is present in the .run file - so: unpack the .run file using ./cuda*.run --tar mxvf
copy InstallUtils.pm (should be in the /bin path) to /usr/lib/x86_64-linux-gnu/perl-base
Run the installer (You may want to say no to the driver install step to keep the one you install through apt - I'm using 381.22, because 375.26, which is provided by the .run file does not support my 1080ti)
gcc 6 is incompatible with CUDA, but this is easily remedied for compiling the sample files: just add export EXTRA_NVCCFLAGS="-Xcompiler -std=c++98" to your bashrc file, and comment out
from one of the headers (I think it was host_config.h, but you'll see it once you try to compile) - comment out these lines:
#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 3)
#error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
This is all from memory, so hopefully it's accurate enough.
I managed to find this solution thanks to these useful posts:
https://devtalk.nvidia.com/default/topic/983777/can-t-locate-installutils-pm-in-inc/
https://devtalk.nvidia.com/default/topic/949770/cuda-8-0rc-supporting-gcc6-/
For ubuntu 17.04, I had to use cuda 9.0 (deb version)
https://developer.nvidia.com/cuda-release-candidate-download
I couldn't get it to work otherwise. Cuda 8.0 needs gcc 5.3.1 but cuda 9.0 is compatible with gcc 6.3.0 which is installed on ubuntu 17.04 automatically.
More precisely, this is what I did:
On Ubuntu 17.04, install CUDA 9.0 — you can currently download the beta version 
https://developer.nvidia.com/cuda-release-candidate-download
I downloaded the .deb file and haven’t had any problems — follow the steps they recommend when you download cuda 9.0 
sudo dpkg -i cuda-repo-ubuntu1704-9-0-local-rc_9.0.103-1_amd64.deb
sudo apt-key add /var/cuda-repo-9.0-local-rc/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
Then follow the post installation steps from the nvidia instructions (i.e., setting PATH and LD_LIBRARY_PATH) 
 http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions)
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Cuda 9 is compatible with gcc 6.3.0 (which comes with 17.04). I used arch=sm_52 and sometimes for my make files have to go ‘make clean’.
Installing Cuda 9.0 was the simplest solution in my case.
Alternatively, if you'd prefer cuda 8, you can download the deb file and then use the command
dpkg-deb -x cuda_8.*.deb /usr/local/cuda-8.0
to extract the contents from the deb file and have them placed in the desired directory.
Source: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#advanced-setup)