I was trying to set up GPU to be compatible with Tensorflow on Windows 11 but was encountering a problem when attempting to verify that it had been setup correctly. I have a GPU driver installed and ran the following command in Miniconda under the 'tf' environment as suggested by step 5 of the Tensorflow installation instructions for Windows Native (https://www.tensorflow.org/install/pip#windows-native):
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
However, when I go to check that the GPU has been setup correctly, I encounter the following message:
2022-12-27 01:05:04.628568: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-12-27 01:05:04.628893: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-27 01:05:06.913025: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-12-27 01:05:06.913317: W
~and then after several other lines of similar error messages~
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2022-12-27 01:05:06.915294: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
I can't figure out what is wrong, given that I've merely followed the Tensorflow installation steps. Any ideas on what the problem could be or what I should try next?
Please ensure you have checked the mentioned Hardware requirements and Software requirements in the same link to enable GPU support. Also set the path to the bin directory after installing these software.
Now, follow the Step-by-step instructions to install TensorFlow with GPU setup after installing conda
conda create --name tf python=3.9
conda activate tf
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
pip install --upgrade pip
pip install "tensorflow-gpu<2.11"
to verify the GPU setup:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Related
I am trying to use Tensorflow 2.7.0 with GPU, but I am constantly running into the same issue:
2022-02-03 08:32:31.822484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/username/.cache/pypoetry/virtualenvs/poetry_env/lib/python3.7/site-packages/cv2/../../lib64:/home/username/miniconda3/envs/project/lib/
2022-02-03 08:32:31.822528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
This issue has already appeared multiple times here & on github. However, the solutions usually proposed to a) download the missing CUDA files, b) downgrade/upgrade to the correct CUDA version, c) set the correct LD_LIBRARY_PATH.
I have been already using my PC with CUDA-enabled PyTorch, and I did not have a single issue there. My nvidia-smi returns 11.0 version, which is exactly the only I want to have. Also, if I try to run:
import os
LD_LIBRARY_PATH = '/home/username/miniconda3/envs/project/lib/'
print(os.path.exists(os.path.join(LD_LIBRARY_PATH, "libcudart.so.11.0")))
it returns True. This is exactly the part of LD_LIBRARY_PATH from the error message, where Tensorflow, apparently, cannot see the libcudart.so.11.0 (which IS there).
Is there something really obvious that I am missing?
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.156.00 Driver Version: 450.156.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
nvcc:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Firstly:
Can you find out where the "libcudart.so.11.0" is
If you lost it at error stack, you can replace the "libcudart.so.11.0" by your word in below:
sudo find / -name 'libcudart.so.11.0'
Outputs in my system. This result shows where the "libcudart.so.11.0" is in the system:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0
If the result shows nothing, please make sure you have install cuda or other staff that must install in your system.
Second, add the path to environment file.
# edit /etc/profile
sudo vim /etc/profile
# append path to "LD_LIBRARY_PATH" in profile file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/targets/x86_64-linux/lib
# make environment file work
source /etc/profile
You may also refer to this link
Third thing you may try is:
conda install cudatoolkit
Installing the correct version of cuda 11.3 and cudnn 8.2.1 for tf2.8. Based on this blog https://www.tensorflow.org/install/source#gpu using following commands.
conda uninstall cudatoolkit
conda install cudnn
Then exporting LD path - dynamic link loader path after finding location by
this sudo find / -name 'libcudnn' System was able to find required libraries and use GPU for training.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usr/miniconda3/envs/tf2/lib/
Hope it helped.
Faced the same issue with tensorflow 2.9 and cuda 11.7 on arch linux x86_64 with 2 nvidia gpus (1080ti / titan rtx) and solved it:
It is not absolutely necessary to respect the compatibility matrix (cuda 11.7 vs 11.2 so minor superior version). But python 3 version was downgraded according to the tensorflow comp matrix (3.10 to 3.7).
Note that you can have multiple cuda version installed and manage it by symlink on linux. (win should be different a bit)
setup with conda and python 3.7
sudo pacman -S base-devel cudnn
conda activate tf-2.9
conda uninstall cudatoolkit && conda install cudnn
I've also had to update gcc for another lib (out of topic)
conda install -c conda-forge gcc=12.1.0
added the snippet for debug according to tf-gpu docs
import tensorflow as tf
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
I now see 2 gpu detected instead of 0, training time is divided by 10.
nvidia-smi reports ram usage maxed and power level raised from 9W to 150W validating the usage of the gpu (the other was left idle).
Rootcause: cudnn was not installed system-wide.
here's the code that i use to check if tf.gpu is working or not
import tensorflow as tf
if tf.test.gpu_device_name():
print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
and here's the error
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
2020-11-22 21:53:40.971514: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-11-22 21:53:40.971756: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
To use the GPU with Tensorflow, you must install the gpu version of Tensorflow
python -m pip install tensorflow-gpu
Make sure that you are also using a 64 bit version of python, as it will only work with those parameters.
EDIT:
As of Tensorflow 2.0+, both the CPU and GPU versions of Tensorflow have been packaged together.
To get Tensorflow to work with your GPU, you need to download cuDNN. Depending on what CUDA version you have, you will need to place some header files and some dll files in the file location of where you installed CUDA.
System information
- Linux Ubuntu 16.04
TensorFlow installed from binary (pip install)
TensorFlow version:
Python version: 3.5
Installed using virtualenv? pip? conda?: pip and virtualenv
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:
Problem described
i was following the tutorial for using intel neural stick 2 for object detection https://towardsdatascience.com/speed-up-predictions-on-low-power-devices-using-neural-compute-stick-and-openvino-98f3ae9dcf41
in the example i install the prerequisites using the command
sudo ./opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/install_prerequisites/install_prerequisites.sh
tensorflow was installed with the prerequisites , i also installed tensorflow using pip install , but when i run the next command
mo_tf.py \
--input_model ~/Downloads/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb \
--tensorflow_use_custom_operations_config /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/extensions/front/tf/ssd_support.json \
--tensorflow_object_detection_api_pipeline_config ~/Downloads/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config \
--data_type FP16
i get the following error
F tensorflow/core/platform/cpu_feature_guard.cc:37]
The tensorflow library was compiled to use AVX instructions, but these aren't available in your machine
Aborted (core dumped)
i am getting the same error when try and import tensorflow
what should i do to solve this error ?
The error message indicates that the machine does not support avx. Is it so? You can refer this link How to tell if a Linux machine supports AVX/AVX2 instructions? to check the same.
If your machine does not support AVX, then the solution would be to build tensorflow from source excluding those settings
I'm a newbie when it comes to AWS and Tensorflow and I've been learning about CNNs over the last week via Udacity's Machine Learning course.
Now I've a need to use an AWS instance of a GPU. I launched a p2.xlarge instance of Deep Learning AMI with Source Code (CUDA 8, Ubuntu) (that's what they recommended)
But now, it seems that tensorflow is not using the GPU at all. It's still training using the CPU. I did some searching and I found some answers to this problem and none of them seemed to work.
When I run the Jupyter notebook, it still uses the CPU
What do I do to get it to run on the GPU and not the CPU?
The problem of tensorflow not detecting GPU can possibly be due to one of the following reasons.
Only the tensorflow CPU version is installed in the system.
Both tensorflow CPU and GPU versions are installed in the system, but the Python environment is preferring CPU version over GPU version.
Before proceeding to solve the issue, we assume that the installed environment is an AWS Deep Learning AMI having CUDA 8.0 and tensorflow version 1.4.1 installed. This assumption is derived from the discussion in comments.
To solve the problem, we proceed as follows:
Check the installed version of tensorflow by executing the following command from the OS terminal.
pip freeze | grep tensorflow
If only the CPU version is installed, then remove it and install the GPU version by executing the following commands.
pip uninstall tensorflow
pip install tensorflow-gpu==1.4.1
If both CPU and GPU versions are installed, then remove both of them, and install the GPU version only.
pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip install tensorflow-gpu==1.4.1
At this point, if all the dependencies of tensorflow are installed correctly, tensorflow GPU version should work fine. A common error at this stage (as encountered by OP) is the missing cuDNN library which can result in following error while importing tensorflow into a python module
ImportError: libcudnn.so.6: cannot open shared object file: No such
file or directory
It can be fixed by installing the correct version of NVIDIA's cuDNN library. Tensorflow version 1.4.1 depends upon cuDNN version 6.0 and CUDA 8, so we download the corresponding version from cuDNN archive page (Download Link). We have to login to the NVIDIA developer account to be able to download the file, therefore it is not possible to download it using command line tools such as wget or curl. A possible solution is to download the file on host system and use scp to copy it onto AWS.
Once copied to AWS, extract the file using the following command:
tar -xzvf cudnn-8.0-linux-x64-v6.0.tgz
The extracted directory should have structure similar to the CUDA toolkit installation directory. Assuming that CUDA toolkit is installed in the directory /usr/local/cuda, we can install cuDNN by copying the files from the downloaded archive into corresponding folders of CUDA Toolkit installation directory followed by linker update command ldconfig as follows:
cp cuda/include/* /usr/local/cuda/include
cp cuda/lib64/* /usr/local/cuda/lib64
ldconfig
After this, we should be able to import tensorflow GPU version into our python modules.
A few considerations:
If we are using Python3, pip should be replaced with pip3.
Depending upon user privileges, the commands pip, cp and ldconfig may require to be run as sudo.
I am installing latest Tensorflow library in my ubuntu 16.04 machine.
For this I downloaded and Installed Latest Cuda toolkits and Cuda nn libraries.
After Installation I checked it out using following commands.
(/home/naseer/anaconda2/) naseer#naseer-Virtual-Machine:~/anaconda2$ python
Python 2.7.13 |Anaconda 4.3.1 (64-bit)| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:102] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /usr/local/cuda-8.0.61/lib64
I tensorflow/stream_executor/cuda/cuda_dnn.cc:2259] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
What does the above output mean? does it mean that Tensorflow will correctly run on my Nvidia GPU enabled system or do I need to do something else?
My local Directory Structure:
I have added following screen shot that shows various library path in my local directories.
My Understanding
I have feeling that it is trying to open CUDA library in the path /usr/local/cuda-8.0.61/lib64 when infact there are paths of /usr/local/cuda-8.0/lib64 and /usr/local/cuda/lib64. Itried to rename that path but still could not work?
Updates (Conflicting Directory Structure)
To run TensorFlow, you have to install cuDNN. There are two possible ways:
1. Installing cuDNN for all Users:
This is the way that the official TensorFlow documentation describes.
Here, cuDNN is installed into the folder /usr/local/cuda. That way, cuDNN can be used by all users on that machine. The instructions are taken from the TensorFlow documentation:
Download the correct cuDNN version. For TensorFlow r1.1, that would be cuDNN v5.1 for CUDA 8.0.
Unpack the .tgz file. Open a terminal, navigate to the folder where you downloaded cuDNN, and call
tar xvzf cudnn-8.0-linux-x64-v5.1-ga.tgz
Note: this is just an example, check the file name before calling this.
This will create a new folder called cuda, which contains two subfolders include and lib64, containing all cuDNN files.
Move the downloaded files to /usr/local/cuda. You will need sudo rights for this!
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
And that's already it. TensorFlow should now work as expected.
2. Installing cuDNN locally:
If you do not have admin rights, or you want to have different cuDNN versions on your machine, you can install cuDNN to any folder of your choice, and then set the paths correctly. This method is proposed in this answer on StackOverflow and is explained in the official NVIDIA installation instructions.
Step 1 and 2 are the same as above.
Move the extracted cuda folder to the place you choose.
Add this directory to the $LD_LIBRARY_PATH environment variable. In a terminal, you can do this by calling
export LD_LIBRARY_PATH=/path/to/cudnn/lib64:$LD_LIBRARY_PATH
where /path/to/cudnn is the place where you moved cuDNN in the previous step. Note the lib64 at the end!
Usually, you'll have to call this every time before starting TensorFlow. To avoid this, you can edit the file ~/.bashrc and add this line at the bottom of the file. This will automatically add cuDNN to the path every time you start a terminal window.
With that, TensorFlow will be able to find cuDNN and work as expected.
To run a GPU enabled TensorFlow 1.4 you should first install CUDA 8 (+patch 2) and cuDNN v6.0, you may find this step-by-step installation guide useful.
After installing the CUDA 8 drivers you will need to install cuDNN v6.0:
Download the cuDNN v6.0 driver. The driver can be downloader from here, please note that you will need to register first.
Copy the driver to the remote machine (scp -r -i ...)
Extract the cuDNN files, copy them to the target directory and extract the files from the .tgz file:
tar xvzf cudnn-8.0-linux-x64-v6.0.tgz
sudo cp -P cuda/include/cudnn.h /usr/local/cuda/includesudo
cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*
Update your bash file
nano ~/.bashrc
Add the following lines to the end of the bash file:
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export PATH=${CUDA_HOME}/bin:${PATH}
Install the libcupti-dev library
sudo apt-get install libcupti-dev
Install pip
sudo apt-get install python-pip
sudo pip install –upgrade pip
Install TensorFlow
sudo pip install tensorflow-gpu
Test the installation, by running the following within the Python command line:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == ‘GPU’]
get_available_gpus()
For a single GPU the output should be similar to:
2017-11-22 03:18:15.187419: I
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports
instructions that this TensorFlow binary was not compiled to use:
SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-22 03:18:17.986516: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful
NUMA node read from SysFS had negative value (-1), but there must be
at least one NUMA node, so returning NUMA node zero
2017-11-22 03:18:17.986867: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0
with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2017-11-22 03:18:17.986896: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating
TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci
bus id: 0000:00:1e.0, compute capability: 3.7)
[u’/device:GPU:0′]