How to install older versions of CUDA in conda environment? - tensorflow

I've been struggling to install an older version of CUDA in a conda environment. In practice, I need to install TensorFlow 1.14 and CUDA 8.0, with the nvcc compiler.
This is what I've tried:
conda create -n tf114-cuda8 tensorflow-gpu=1.14 cudatoolkit=8.0 python=3.6
source activate tf114-cuda8
At this point, nvcc is not available, because it is not included with default conda installation. Thus, I ran:
conda install -c conda-forge nvcc_linux-64=9.2
..but nothing! Now I have nvcc under the environment directory: ~/venvs/conda/miniconda3/envs/tf114-cuda8/, but it seems it is not the right one? In fact, if I do:
nvcc-version
it returns:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA
Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation
tools, release 10.1, V10.1.243
What has gone wrong? Why is the nvcc version the 10.1? how can I install TensorFlow 1.14 with CUDA 8.0 and nvcc included?

Related

Could not load dynamic library 'libcudart.so.11.0';

I am trying to use Tensorflow 2.7.0 with GPU, but I am constantly running into the same issue:
2022-02-03 08:32:31.822484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/username/.cache/pypoetry/virtualenvs/poetry_env/lib/python3.7/site-packages/cv2/../../lib64:/home/username/miniconda3/envs/project/lib/
2022-02-03 08:32:31.822528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
This issue has already appeared multiple times here & on github. However, the solutions usually proposed to a) download the missing CUDA files, b) downgrade/upgrade to the correct CUDA version, c) set the correct LD_LIBRARY_PATH.
I have been already using my PC with CUDA-enabled PyTorch, and I did not have a single issue there. My nvidia-smi returns 11.0 version, which is exactly the only I want to have. Also, if I try to run:
import os
LD_LIBRARY_PATH = '/home/username/miniconda3/envs/project/lib/'
print(os.path.exists(os.path.join(LD_LIBRARY_PATH, "libcudart.so.11.0")))
it returns True. This is exactly the part of LD_LIBRARY_PATH from the error message, where Tensorflow, apparently, cannot see the libcudart.so.11.0 (which IS there).
Is there something really obvious that I am missing?
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.156.00 Driver Version: 450.156.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
nvcc:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Firstly:
Can you find out where the "libcudart.so.11.0" is
If you lost it at error stack, you can replace the "libcudart.so.11.0" by your word in below:
sudo find / -name 'libcudart.so.11.0'
Outputs in my system. This result shows where the "libcudart.so.11.0" is in the system:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0
If the result shows nothing, please make sure you have install cuda or other staff that must install in your system.
Second, add the path to environment file.
# edit /etc/profile
sudo vim /etc/profile
# append path to "LD_LIBRARY_PATH" in profile file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/targets/x86_64-linux/lib
# make environment file work
source /etc/profile
You may also refer to this link
Third thing you may try is:
conda install cudatoolkit
Installing the correct version of cuda 11.3 and cudnn 8.2.1 for tf2.8. Based on this blog https://www.tensorflow.org/install/source#gpu using following commands.
conda uninstall cudatoolkit
conda install cudnn
Then exporting LD path - dynamic link loader path after finding location by
this sudo find / -name 'libcudnn' System was able to find required libraries and use GPU for training.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usr/miniconda3/envs/tf2/lib/
Hope it helped.
Faced the same issue with tensorflow 2.9 and cuda 11.7 on arch linux x86_64 with 2 nvidia gpus (1080ti / titan rtx) and solved it:
It is not absolutely necessary to respect the compatibility matrix (cuda 11.7 vs 11.2 so minor superior version). But python 3 version was downgraded according to the tensorflow comp matrix (3.10 to 3.7).
Note that you can have multiple cuda version installed and manage it by symlink on linux. (win should be different a bit)
setup with conda and python 3.7
sudo pacman -S base-devel cudnn
conda activate tf-2.9
conda uninstall cudatoolkit && conda install cudnn
I've also had to update gcc for another lib (out of topic)
conda install -c conda-forge gcc=12.1.0
added the snippet for debug according to tf-gpu docs
import tensorflow as tf
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
I now see 2 gpu detected instead of 0, training time is divided by 10.
nvidia-smi reports ram usage maxed and power level raised from 9W to 150W validating the usage of the gpu (the other was left idle).
Rootcause: cudnn was not installed system-wide.

Tensorflow + Pytorch install Cudatoolkit 11.2

I have a Windows 10 machine with an nvidia 3080 card. 11.2 with CudaToolkit
I want to install a Pytorch in addition to TensorFlow, which works 100% fine so far.
If I understand the description correctly, the CudaToolkit installed without the Cuda Python env is “independent” of the Cuda toolkit version installed for Windows.
I tried to install Pytorch with this command, but it does not want to recognize the GPU.
pip install torch==1.9.0+cu102 torchvision==0.10.0+cu102 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

I already have a CUDA toolkit installed, why is conda installing CUDA again?

I have installed cuda version 11.2 and CUDNN version 8.1 in ubuntu
cnvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
When I installed tensorflow-gpu in conda environment, it is again installing cuda and cudnn.
Why is it happening.
How to stop conda from installing cuda and cudnn again?
Can I just use cuda and cudnn that I have already installed? If yes, how?
Why is it happening?
Conda expects to manage any packages you install and all their dependencies. The intention is that you literally never have to install anything else by hand for any packages they distribute in their own channel. If a GPU accelerated package requires a CUDA runtime, conda will try to select and install a correctly versioned CUDA runtime for the version of the Python package it has selected for installation.
How to stop conda from installing cuda and cudnn again?
You probably can't, or at least can't without winding up with a non-functional Tensorflow installation. But see here -- what conda installs is only the necessary, correctly versioned CUDA runtime components to make their GPU accelerated packages work. All they don't/can't install is a GPU driver for the hardware.
Can I just use cuda and cudnn that I have already installed?
You say you installed CUDA 11.2. If you look at the conda output, you can see that it wants to install a CUDA 10.2 runtime. As you are now fully aware, versioning is critical to Tensorflow and a Tensorflow build requiring CUDA 10.2 won't work with CUDA 11.2. So even if you were to stop conda from performing the dependency installation, there is a version mismatch so it wouldn't work.
If yes, how?
See above.

Can I install cuda 10.2 for using tensorflow 2.1 or it has to be cuda 10.1?

Can I install cuda 10.2 for using tensorflow 2.1 or it has to be cuda 10.1?
I am using ubuntu 18.04 and I have a NVIDIA Quadro P5000.
Providing the solution here (Answer Section), even though it is present in the Comment Section, for the benefit of the community.
No, as per Tensorflow documentation, TensorFlow supports CUDA 10.1 (TensorFlow >= 2.1.0), please refer compatible version details
Pytorch need CUDA 10.2 but Tensorflow need cuda 10.1. Is it a joke?
No, you can use cuda version 10.2 with tensorflow 2.0.
It is quite simple.
WHY:
When run "import tensorflow", the tensorflow will search a library named 'libcudart.so.$.$' in LD_LIBRARY_PATH. For tensorflow 2.1.0-2.3.0 with cuda 10.1, it's 'libcudart.so.10.1'. With cuda 10.2, we don't have 'libcudart.so.10.1', so there will be a error.
In fact there are not any difference between cuda 10.1 and cuda 10.2, so we can solve this problem through the soft links.
HOW
cd /usr/local/cuda-10.2/targets/x86_64-linux/lib/
ln -s libcudart.so.10.2.89 libcudart.so.10.1
/usr/local/cuda-10.2/extras/CUPTI/lib64
ln -s libcupti.so.10.2.75 libcupti.so.10.1
cd /usr/local/cuda-10.2/lib64
ln -s libcudnn.so.8 libcudnn.so.7
vim /etc/profile
export CUDA_HOME=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
source /etc/profile
Click the button to see the picture.
Done!

Failed linking MKL when buildng TensorFlow

I try to build tensorflow v1.13 with MKL. The build is performed successfully, the pip package is correctly created, but when I test the package, MKL is clearly not used in the end, i.e. when I run in Python 3.6.8:
import tensorflow
print("Is MKL enabled?{}".format(tensorflow.pywrap_tensorflow.IsMklEnabled()))
It returns "False"
I am operating on CentOS 7 in the following conda environment:
# Name Version Build Channel
blas 1.0 mkl
ca-certificates 2019.1.23 0
certifi 2018.1.18 py36_2 intel
cloog 0.18.0 0
cython 0.29.6 py36he6710b0_0
gcc 4.8.5 7
gmp 6.1.2 h6c8ec71_1
icc_rt 2019.3 intel_199 intel
intel-openmp 2019.3 intel_199 intel
intelpython 2019.3 0 intel
isl 0.12.2 0
keras-applications 1.0.7 pypi_0 pypi
keras-preprocessing 1.0.9 pypi_0 pypi
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran-ng 7.3.0 hdf63c60_0
libstdcxx-ng 8.2.0 hdf63c60_1
mkl 2019.3 intel_199 intel
mkl-dnn 0.14 2 intel
mkl_fft 1.0.11 py36h7b7c402_0 intel
mkl_random 1.0.2 py36h7b7c402_4 intel
mock 2.0.0 py36_0
mpc 1.0.3 hec55b23_5
mpfr 3.1.5 h11a74b3_2
ncurses 6.1 he6710b0_1
numpy 1.16.2 py36h7e9f1db_0
numpy-base 1.16.2 py36hde5b4d6_0
openmp 2018.0.3 intel_0 intel
openssl 1.1.1b h7b6447c_1
pbr 5.1.3 py_0
pip 19.0.3 py36_0
python 3.6.8 h0371630_0
readline 7.0 h7b6447c_5
setuptools 40.8.0 py36_0
six 1.12.0 py36_0
sqlite 3.27.2 h7b6447c_0
tbb 2019.4 intel_199 intel
tk 8.6.8 hbc83047_0
wheel 0.33.1 py36_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 h7b6447c_3
Bazel is in version 0.23.0
GCC is in version 7.3.1 (see below)
I use the following command lines to build the tensorflow package:
scl enable devtoolset-7 bash
bazel build --config=mkl --config=opt --copt=-march=x86-64 --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 --copt=-O2 --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Obviously, many warnings occur, but none of them is related to MKL. Noteworthy, I am using GCC7 here. I tried GCC4 and GCC5. I am using
That didn't change the problem. Does someone have a clue about why MKL is not linked?
As tensorflow version==1.13 is the latest release ,it might not be stable(https://www.tensorflow.org/versions/)
It worked fine installing with tensorflow version==1.11 using the below steps.
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout r1.11
conda create -n <conda_environment_name> python=3.6
source activate <conda_environment_name>
conda install numpy bazel
conda install -c intel mkl-dnn
conda install -c conda-forge keras-preprocessing
Configure your system build by running the following at the root of your TensorFlow source tree:
$./configure
bazel build command for Support for the Intel® MKL-DNN.
bazel build --config=mkl -c opt //tensorflow/tools/pip_package:build_pip_package
create a directory tfwheels to place the wheel files and run the below commands
mkdir tfwheels
bazel-bin/tensorflow/tools/pip_package/build_pip_package /home/<your_path>/tfwheels/tensorflow_pkg
pip install /home/<your_path>/tfweels/tensorflow_pkg/tensorflow-<version>-cp36-cp36m-linux_x86_64.whl
for tensorflow version==1.12(similar steps)
refer
https://software.intel.com/en-us/articles/intel-optimization-for-tensorflow-installation-guide
Hope this helps.