Advice on tensorflow 1.13 on cuda 10.1 - tensorflow

I rushed a bit and upgraded to Ubuntu 18.10, couldn't find a cuda 10.0 version for it so went for cuda 10.1... which I understand tensorflow doesn't support yet.
Would you advise reverting back to Ubuntu 18.04 or patiently waiting for a compatible tensorflow release?

tensorflow 1.13 doesn't work with cuda 10.1 because of the following
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory.
tensorflow is looking for libcublas.so.10.0 whereas cuda provides libcublas.so.10.1.0.105.

The older cuda drivers are available here: https://developer.nvidia.com/cuda-toolkit-archive
TensorFlow provided binaries went from CUDA 9.0 to CUDA 10.0, skipping CUDA 9.1 and 9.2. So I would not recommend waiting for CUDA 10.1 TensorFlow binaries.
It is highly likely that CUDA 10.0 would run fine on Ubuntu 18.10, but if that is not the case, then yes revert back to 18.04.

Related

Tensorflow 1.15 cannot detect gpu with Cuda10.1

I have installed both tensorflow 2.2.0 and tensorflow 1.15.0(by pip install tensorflow-gpu==1.15.0). The tensorflow 2 is installed in the base environment of Anaconda 3, while the tensorflow 1 is installed in a separate environment.
The tensorflow 2.2.0 can recognize gpu based on a simple test:
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
//output: Default GPU Device: /device:GPU:0
But the tensorflow 1.15.0 can not detect gpu.
For your information, my system environment is python + cuda 10.1 + vs 2015.
The tensosflow versions 1.15.0 to 1.15.3 (the latest version) are all compiled against Cuda 10.0. Downgrading the cuda 10.1 to cuda 10.0 solved the problem.
Also be aware of the python version. It is recommended to install the tensorflow .whl file (as listed at https://nero-mirror.stanford.edu/pypi/simple/tensorflow-gpu/) for the specific python version. As for installation, see How do I install a Python package with a .whl file?
Tensorflow 1.15 expects cuda 10.0 , but I managed to make it work with cuda 10.1 by installing the following packages with Anaconda: cudatoolkit (10.0) and cudnn (7.6.5). So, after running
conda install cudatoolkit=10.0
conda install cudnn=7.6.5
tensorflow 1.15 was able to find and use GPU (which is using cuda 10.1).
PS: I understand your environment is Windows based, but this question pops on Google for the same problem happening on Linux (where I tested this solution). Might be useful also on Windows.
It might have to do with the version compatibility of TF, Cuda and CuDNN. This post has it discussed thoroughly.
Have you tried installing Anaconda? it downloads all the requirements and make it easy for you with just a few clicks.

Can I install cuda 10.2 for using tensorflow 2.1 or it has to be cuda 10.1?

Can I install cuda 10.2 for using tensorflow 2.1 or it has to be cuda 10.1?
I am using ubuntu 18.04 and I have a NVIDIA Quadro P5000.
Providing the solution here (Answer Section), even though it is present in the Comment Section, for the benefit of the community.
No, as per Tensorflow documentation, TensorFlow supports CUDA 10.1 (TensorFlow >= 2.1.0), please refer compatible version details
Pytorch need CUDA 10.2 but Tensorflow need cuda 10.1. Is it a joke?
No, you can use cuda version 10.2 with tensorflow 2.0.
It is quite simple.
WHY:
When run "import tensorflow", the tensorflow will search a library named 'libcudart.so.$.$' in LD_LIBRARY_PATH. For tensorflow 2.1.0-2.3.0 with cuda 10.1, it's 'libcudart.so.10.1'. With cuda 10.2, we don't have 'libcudart.so.10.1', so there will be a error.
In fact there are not any difference between cuda 10.1 and cuda 10.2, so we can solve this problem through the soft links.
HOW
cd /usr/local/cuda-10.2/targets/x86_64-linux/lib/
ln -s libcudart.so.10.2.89 libcudart.so.10.1
/usr/local/cuda-10.2/extras/CUPTI/lib64
ln -s libcupti.so.10.2.75 libcupti.so.10.1
cd /usr/local/cuda-10.2/lib64
ln -s libcudnn.so.8 libcudnn.so.7
vim /etc/profile
export CUDA_HOME=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${CUDA_HOME}/extras/CUPTI/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
source /etc/profile
Click the button to see the picture.
Done!

Is Tensorflow 1.12 compatible with CUDA 10.1?

I've been able to successfully set up an Ubuntu 18.04 server with nvidia-smi 418.39, Driver version 418.39, and CUDA 10.1
I now have a user who wants to run TensorFlow but insists that it is not compatible with CUDA 10.1, only CUDA 10. There is no statement confirming this online anywhere that I can find, nor is it in any release patch notes from TF. Because setting this system up was kind of a pain to do, I'm a little hesitant to try downgrading just one version.
Does anyone have verification whether TensorFlow 1.12 does or does not work with CUDA 10.1?
I can confirm that even tf 1.13.1 only works with CUDA 10.0 for me, not 10.1.
Don't know if symlink will work through.
If you try to run tf 1.13.1 on CUDA 10.1, it will give you "ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory"
TensorFlow 1.12 (and even later versions 1.13.1 and 2.0.0-alpha0) could not be built against CUDA 10.1, thus can be considered incompatible.
I have tried building TensorFlow from source with GPU support. The TensorFlow versions I considered were 1.13.1 and 2.0.0-alpha0. The machine I used runs CentOS 7.6 with GCC 4.8.5. I have the NVIDIA Driver version 418.67 installed (which has the release date 2019.5.7 and supports CUDA Toolkit 10.1).
I succeeded in building both TensorFlow versions with CUDA 10.0 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.0). Note that you don't need to have the GPU attached to the machine (especially if you're using a VM in the cloud) while you're building TensorFlow with GPU support.
However, when I switched to CUDA 10.1 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.1), none of these TensorFlow versions could be built. Besides the changes in location of libcublas, another source of the error is no libcudart.so* are found in cuda-10.1/lib64/ (while they do exist in cuda-10.0/lib64/).
I can also confirm that tf 1.13.1 does not work with CUDA 10.1. While importing tensorflow you will get the following error
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
running ldconfig -v shows the difference
libcublas.so.10.0 vs libcublas.so.10.1.0.105

Does tensorflow gpu version uses cudnn by default

as the title, I have installed tensorflow gpu version and I want to know whether tensorflow-gpu uses CUDNN by default!
Tensorflow Release 1.6.0 Prebuilt binaries are now built against CUDA 9.0 and cuDNN 7.

Tensorflow GPU build with CPU

Does the tensorflow GPU version can be used with CPU only machines?
Getting some issues with Tensorflow 1.5 regarding cuda libraries not found.
Anybody tried this?
The CUDA 9 library is required for tensorflow-gpu version 1.5 and above, and CUDA 9 must be installed with a CUDA-capable GPU. You should probably install the CPU version of Tensorflow.