Does the tensorflow GPU version can be used with CPU only machines?
Getting some issues with Tensorflow 1.5 regarding cuda libraries not found.
Anybody tried this?
The CUDA 9 library is required for tensorflow-gpu version 1.5 and above, and CUDA 9 must be installed with a CUDA-capable GPU. You should probably install the CPU version of Tensorflow.
Related
I am trying to run ./TTS/bin/train_tacotron.py with GPU in Powershell.
I followed these instructions, which got me pretty far: the config is read, the model restored, but as training is about to start, I get the message:
UserWarning: NVIDIA GeForce RTX 3060 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
The instructions specified don't really help. I tried installing the most recent stable version of PyTorch, as well as trying 1.7.1 (as opposed to 1.8.0 as recommended in the instructions I linked), but I got the same message.
How can I get this to run on my GPU?
Side note: I was successfully able to run training on my GPU in WSL, but it froze after a few hundred epochs, so I wanted to try Powershell to see if it made a difference.
In order to work properly with your current CUDA version, you need to specify the version 11.3 to cudatoolkit. Execute the following commands:
conda uninstall cudatoolkit
conda install cudatoolkit=11.3 -c pytorch
I have installed both tensorflow 2.2.0 and tensorflow 1.15.0(by pip install tensorflow-gpu==1.15.0). The tensorflow 2 is installed in the base environment of Anaconda 3, while the tensorflow 1 is installed in a separate environment.
The tensorflow 2.2.0 can recognize gpu based on a simple test:
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
//output: Default GPU Device: /device:GPU:0
But the tensorflow 1.15.0 can not detect gpu.
For your information, my system environment is python + cuda 10.1 + vs 2015.
The tensosflow versions 1.15.0 to 1.15.3 (the latest version) are all compiled against Cuda 10.0. Downgrading the cuda 10.1 to cuda 10.0 solved the problem.
Also be aware of the python version. It is recommended to install the tensorflow .whl file (as listed at https://nero-mirror.stanford.edu/pypi/simple/tensorflow-gpu/) for the specific python version. As for installation, see How do I install a Python package with a .whl file?
Tensorflow 1.15 expects cuda 10.0 , but I managed to make it work with cuda 10.1 by installing the following packages with Anaconda: cudatoolkit (10.0) and cudnn (7.6.5). So, after running
conda install cudatoolkit=10.0
conda install cudnn=7.6.5
tensorflow 1.15 was able to find and use GPU (which is using cuda 10.1).
PS: I understand your environment is Windows based, but this question pops on Google for the same problem happening on Linux (where I tested this solution). Might be useful also on Windows.
It might have to do with the version compatibility of TF, Cuda and CuDNN. This post has it discussed thoroughly.
Have you tried installing Anaconda? it downloads all the requirements and make it easy for you with just a few clicks.
I rushed a bit and upgraded to Ubuntu 18.10, couldn't find a cuda 10.0 version for it so went for cuda 10.1... which I understand tensorflow doesn't support yet.
Would you advise reverting back to Ubuntu 18.04 or patiently waiting for a compatible tensorflow release?
tensorflow 1.13 doesn't work with cuda 10.1 because of the following
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory.
tensorflow is looking for libcublas.so.10.0 whereas cuda provides libcublas.so.10.1.0.105.
The older cuda drivers are available here: https://developer.nvidia.com/cuda-toolkit-archive
TensorFlow provided binaries went from CUDA 9.0 to CUDA 10.0, skipping CUDA 9.1 and 9.2. So I would not recommend waiting for CUDA 10.1 TensorFlow binaries.
It is highly likely that CUDA 10.0 would run fine on Ubuntu 18.10, but if that is not the case, then yes revert back to 18.04.
I've been able to successfully set up an Ubuntu 18.04 server with nvidia-smi 418.39, Driver version 418.39, and CUDA 10.1
I now have a user who wants to run TensorFlow but insists that it is not compatible with CUDA 10.1, only CUDA 10. There is no statement confirming this online anywhere that I can find, nor is it in any release patch notes from TF. Because setting this system up was kind of a pain to do, I'm a little hesitant to try downgrading just one version.
Does anyone have verification whether TensorFlow 1.12 does or does not work with CUDA 10.1?
I can confirm that even tf 1.13.1 only works with CUDA 10.0 for me, not 10.1.
Don't know if symlink will work through.
If you try to run tf 1.13.1 on CUDA 10.1, it will give you "ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory"
TensorFlow 1.12 (and even later versions 1.13.1 and 2.0.0-alpha0) could not be built against CUDA 10.1, thus can be considered incompatible.
I have tried building TensorFlow from source with GPU support. The TensorFlow versions I considered were 1.13.1 and 2.0.0-alpha0. The machine I used runs CentOS 7.6 with GCC 4.8.5. I have the NVIDIA Driver version 418.67 installed (which has the release date 2019.5.7 and supports CUDA Toolkit 10.1).
I succeeded in building both TensorFlow versions with CUDA 10.0 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.0). Note that you don't need to have the GPU attached to the machine (especially if you're using a VM in the cloud) while you're building TensorFlow with GPU support.
However, when I switched to CUDA 10.1 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.1), none of these TensorFlow versions could be built. Besides the changes in location of libcublas, another source of the error is no libcudart.so* are found in cuda-10.1/lib64/ (while they do exist in cuda-10.0/lib64/).
I can also confirm that tf 1.13.1 does not work with CUDA 10.1. While importing tensorflow you will get the following error
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
running ldconfig -v shows the difference
libcublas.so.10.0 vs libcublas.so.10.1.0.105
as the title, I have installed tensorflow gpu version and I want to know whether tensorflow-gpu uses CUDNN by default!
Tensorflow Release 1.6.0 Prebuilt binaries are now built against CUDA 9.0 and cuDNN 7.