here's the code that i use to check if tf.gpu is working or not
import tensorflow as tf
if tf.test.gpu_device_name():
print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
and here's the error
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
2020-11-22 21:53:40.971514: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-11-22 21:53:40.971756: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
To use the GPU with Tensorflow, you must install the gpu version of Tensorflow
python -m pip install tensorflow-gpu
Make sure that you are also using a 64 bit version of python, as it will only work with those parameters.
EDIT:
As of Tensorflow 2.0+, both the CPU and GPU versions of Tensorflow have been packaged together.
To get Tensorflow to work with your GPU, you need to download cuDNN. Depending on what CUDA version you have, you will need to place some header files and some dll files in the file location of where you installed CUDA.
Related
I was trying to set up GPU to be compatible with Tensorflow on Windows 11 but was encountering a problem when attempting to verify that it had been setup correctly. I have a GPU driver installed and ran the following command in Miniconda under the 'tf' environment as suggested by step 5 of the Tensorflow installation instructions for Windows Native (https://www.tensorflow.org/install/pip#windows-native):
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
However, when I go to check that the GPU has been setup correctly, I encounter the following message:
2022-12-27 01:05:04.628568: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-12-27 01:05:04.628893: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-27 01:05:06.913025: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-12-27 01:05:06.913317: W
~and then after several other lines of similar error messages~
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2022-12-27 01:05:06.915294: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
I can't figure out what is wrong, given that I've merely followed the Tensorflow installation steps. Any ideas on what the problem could be or what I should try next?
Please ensure you have checked the mentioned Hardware requirements and Software requirements in the same link to enable GPU support. Also set the path to the bin directory after installing these software.
Now, follow the Step-by-step instructions to install TensorFlow with GPU setup after installing conda
conda create --name tf python=3.9
conda activate tf
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
pip install --upgrade pip
pip install "tensorflow-gpu<2.11"
to verify the GPU setup:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
I tried a lot of things before I could finally figure out this approach. There are a lot of videos and blogs asking to install the Cuda toolkit and cuDNN from the website. Checking the compatible version. But this is not required anymore all you have to do is the following
pip install tensorflow-gpu
pip install cuda
pip install cudnn
then use the following code to check if your GPU is active in the current notebook
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
tf.config.list_physical_devices('GPU')
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
tf.test.is_built_with_cuda()
tf.debugging.set_log_device_placement(True)
I just want to confirm, if these steps are enough to enable GPU in jupyter notebook or am I missing something here?
If you installed the compatible versions of CUDA and cuDNN (relative to your GPU), Tensorflow should use that since you installed tensorflow-gpu. If you want to be sure, run a simple demo and check out the usage on the task manager.
I am trying to use Tensorflow 2.7.0 with GPU, but I am constantly running into the same issue:
2022-02-03 08:32:31.822484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/username/.cache/pypoetry/virtualenvs/poetry_env/lib/python3.7/site-packages/cv2/../../lib64:/home/username/miniconda3/envs/project/lib/
2022-02-03 08:32:31.822528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
This issue has already appeared multiple times here & on github. However, the solutions usually proposed to a) download the missing CUDA files, b) downgrade/upgrade to the correct CUDA version, c) set the correct LD_LIBRARY_PATH.
I have been already using my PC with CUDA-enabled PyTorch, and I did not have a single issue there. My nvidia-smi returns 11.0 version, which is exactly the only I want to have. Also, if I try to run:
import os
LD_LIBRARY_PATH = '/home/username/miniconda3/envs/project/lib/'
print(os.path.exists(os.path.join(LD_LIBRARY_PATH, "libcudart.so.11.0")))
it returns True. This is exactly the part of LD_LIBRARY_PATH from the error message, where Tensorflow, apparently, cannot see the libcudart.so.11.0 (which IS there).
Is there something really obvious that I am missing?
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.156.00 Driver Version: 450.156.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
nvcc:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Firstly:
Can you find out where the "libcudart.so.11.0" is
If you lost it at error stack, you can replace the "libcudart.so.11.0" by your word in below:
sudo find / -name 'libcudart.so.11.0'
Outputs in my system. This result shows where the "libcudart.so.11.0" is in the system:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0
If the result shows nothing, please make sure you have install cuda or other staff that must install in your system.
Second, add the path to environment file.
# edit /etc/profile
sudo vim /etc/profile
# append path to "LD_LIBRARY_PATH" in profile file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/targets/x86_64-linux/lib
# make environment file work
source /etc/profile
You may also refer to this link
Third thing you may try is:
conda install cudatoolkit
Installing the correct version of cuda 11.3 and cudnn 8.2.1 for tf2.8. Based on this blog https://www.tensorflow.org/install/source#gpu using following commands.
conda uninstall cudatoolkit
conda install cudnn
Then exporting LD path - dynamic link loader path after finding location by
this sudo find / -name 'libcudnn' System was able to find required libraries and use GPU for training.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usr/miniconda3/envs/tf2/lib/
Hope it helped.
Faced the same issue with tensorflow 2.9 and cuda 11.7 on arch linux x86_64 with 2 nvidia gpus (1080ti / titan rtx) and solved it:
It is not absolutely necessary to respect the compatibility matrix (cuda 11.7 vs 11.2 so minor superior version). But python 3 version was downgraded according to the tensorflow comp matrix (3.10 to 3.7).
Note that you can have multiple cuda version installed and manage it by symlink on linux. (win should be different a bit)
setup with conda and python 3.7
sudo pacman -S base-devel cudnn
conda activate tf-2.9
conda uninstall cudatoolkit && conda install cudnn
I've also had to update gcc for another lib (out of topic)
conda install -c conda-forge gcc=12.1.0
added the snippet for debug according to tf-gpu docs
import tensorflow as tf
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
I now see 2 gpu detected instead of 0, training time is divided by 10.
nvidia-smi reports ram usage maxed and power level raised from 9W to 150W validating the usage of the gpu (the other was left idle).
Rootcause: cudnn was not installed system-wide.
I've been able to successfully set up an Ubuntu 18.04 server with nvidia-smi 418.39, Driver version 418.39, and CUDA 10.1
I now have a user who wants to run TensorFlow but insists that it is not compatible with CUDA 10.1, only CUDA 10. There is no statement confirming this online anywhere that I can find, nor is it in any release patch notes from TF. Because setting this system up was kind of a pain to do, I'm a little hesitant to try downgrading just one version.
Does anyone have verification whether TensorFlow 1.12 does or does not work with CUDA 10.1?
I can confirm that even tf 1.13.1 only works with CUDA 10.0 for me, not 10.1.
Don't know if symlink will work through.
If you try to run tf 1.13.1 on CUDA 10.1, it will give you "ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory"
TensorFlow 1.12 (and even later versions 1.13.1 and 2.0.0-alpha0) could not be built against CUDA 10.1, thus can be considered incompatible.
I have tried building TensorFlow from source with GPU support. The TensorFlow versions I considered were 1.13.1 and 2.0.0-alpha0. The machine I used runs CentOS 7.6 with GCC 4.8.5. I have the NVIDIA Driver version 418.67 installed (which has the release date 2019.5.7 and supports CUDA Toolkit 10.1).
I succeeded in building both TensorFlow versions with CUDA 10.0 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.0). Note that you don't need to have the GPU attached to the machine (especially if you're using a VM in the cloud) while you're building TensorFlow with GPU support.
However, when I switched to CUDA 10.1 and cuDNN 7.6.0 + NCCL 2.4.7 (for CUDA 10.1), none of these TensorFlow versions could be built. Besides the changes in location of libcublas, another source of the error is no libcudart.so* are found in cuda-10.1/lib64/ (while they do exist in cuda-10.0/lib64/).
I can also confirm that tf 1.13.1 does not work with CUDA 10.1. While importing tensorflow you will get the following error
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
running ldconfig -v shows the difference
libcublas.so.10.0 vs libcublas.so.10.1.0.105
I already install tensorflow GPU support.
try install xgboost on tensorflow by
'conda install -c anaconda py-xgboost'
I wonder the xgboost what GPU support or not.
I don't install https://xgboost.readthedocs.io/en/latest/build.html#building-with-gpu-support
only tensorflow GPU support.
Do i need install xgboost Gpu support or not??? if i want use xgboost with GPU support
You can check if your xgboost is compiled for gpu, just try to run some model with tree_method='gpu_hist' or another gpu method (here).
If it would raise an error that xgboost's not compiled for gpu, then reinstall it following the instructions that you have found.
Probably, you don't need install CUDA (if you have successfully installed tensorflow-gpu and it works, then CUDA must be installed already), but you definitely should build gpu-supported xgboost.