"successfully opened CUDA library" not showing while importing tensorflow - tensorflow

I have installed Cuda 11.2 and cuDNN 8.1 and TensorFlow 2.9.1 in my local system.
C:\Users\my>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:15:10_Pacific_Standard_Time_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
tf.print(tf. __version__)
print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))
2.9.1
True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
And environment variable is also set correctly set(Screen shot atteched).Environment variables
However, when I do import TensorFlow, I don't see "successfully opened CUDA library" messages
The following log is being printed in jupyter noteboook.
2022-08-13 12:37:34.012384: I TensorFlow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-13 12:37:35.492829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created
device /job:localhost/replica:0/task:0/device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5

Related

tensorflow-gpu recognizes XLA-CPU instead of GPU

I am trying to install keras-gpu on PC with Tesla V100 and Windows Server 2019. I installed some version (2.4.3) and found that my GPU is not working. I need to install any 2.x.x version of keras with GPU support.
I have installed CUDA 10.1 cudnn 8.0.5 and after many attempts also tried 11.2 version with cudnn 8.1.1 (Also tried 11.5). And started searching version of tensorflow which can find my GPU.
for 10.1:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243
I am using this code to check all:
import tensorflow
print(tensorflow.__version__)
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
My output:
2021-11-06 10:39:16.326880: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2.3.0
2021-11-06 10:39:21.177512: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-06 10:39:21.208333: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x25d395509b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-06 10:39:21.217997: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-11-06 10:39:21.261861: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-11-06 10:39:21.677227: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-06 10:39:21.692028: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: windows-freqgpu
2021-11-06 10:39:21.700398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: windows-freqgpu
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 881354854201867138
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 5868137251793075209
physical_device_desc: "device: XLA_CPU device"
]
Tesla V100 here is XLA_CPU. how to fix this?
You could try installing tensorflow-gpu 2.2.x or 2.3.x which are compatible with CUDA 10.1, as can be checked in the tested build configurations below:
https://www.tensorflow.org/install/source#gpu
If you look at tested build configurations, you will see that tensorflow 2.4.0 is tested for CUDA 11.0. Looking at software requirements on tensorflow GPU support page (https://www.tensorflow.org/install/gpu#software_requirements) you can see that CUDA 11.2 seems to be recommended only for Tensorflow >= 2.5.0.
It is unlikely that your GPU is recognized as a 'XLA_CPU' device. Here 'XLA' stands for 'accelerated linear algebra' (https://www.tensorflow.org/xla). It's a domain specific compiler that can be used both for CPUs and GPUs. For more details you could take a look at this what is XLA_GPU and XLA_CPU for tensorflow. It is more likely that your GPU is simply not detected, as evidenced by this line in your output.
2021-11-06 10:39:21.677227: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
As was mentioned by #talonmies it was driver related problem. To be more precise Tesla driver related problem. I had updated driver, but Tesla requires specific versions of driver for different CUDA versions.
Also for common GPUs CUDA brings correct driver itself.
Correct installation for Tesla v100/Windows Server 2019/CUDA 10.1:
Install CUDA (10.1 in my case)
install driver which fits this CUDA version (427.60)
Install cuDNN (7.6.5)

Mozilla TTS in PowerShell: "UserWarning: NVIDIA GeForce RTX 3060 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation

I am trying to run ./TTS/bin/train_tacotron.py with GPU in Powershell.
I followed these instructions, which got me pretty far: the config is read, the model restored, but as training is about to start, I get the message:
UserWarning: NVIDIA GeForce RTX 3060 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
The instructions specified don't really help. I tried installing the most recent stable version of PyTorch, as well as trying 1.7.1 (as opposed to 1.8.0 as recommended in the instructions I linked), but I got the same message.
How can I get this to run on my GPU?
Side note: I was successfully able to run training on my GPU in WSL, but it froze after a few hundred epochs, so I wanted to try Powershell to see if it made a difference.
In order to work properly with your current CUDA version, you need to specify the version 11.3 to cudatoolkit. Execute the following commands:
conda uninstall cudatoolkit
conda install cudatoolkit=11.3 -c pytorch

The kernel appears to have died. It will restart automatically tensorflow2.0

CUDA Version: 10.0 tensorflow 2.0.0-alpha0 jupyter notebook
python 3.5.6 cuda toolkit 10.0 ubuntu 18.10
define CUDNN_MAJOR 7 define CUDNN_MINOR 5 define
CUDNN_PATCHLEVEL 1 define CUDNN_VERSION (CUDNN_MAJOR * 1000 +
CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL) include "driver_types.h"
The kernel appears to have died. It will restart automatically.
strange thing is cudnn version I installed is 7.5.1.
But error message is keep saying that Loaded runtime CuDNN library: 7.3.1 but source was compiled with: 7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. I have no idea why 7.3.1 cudnn library came from.
This message come out after I installed '2.0.0-alpha0' tensorflow.
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID:
0000:42:00.0 totalMemory: 11.90GiB freeMemory: 11.10GiB 2019-05-14
08:03:49.028136: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible
gpu devices: 0 2019-05-14 08:03:49.028424: I
tensorflow/stream_executor/platform/default/dso_loader.cc:42]
Successfully opened dynamic library libcudart.so.10.0 2019-05-14
08:03:49.029904: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device
interconnect StreamExecutor with strength 1 edge matrix: 2019-05-14
08:03:49.029928: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0
2019-05-14 08:03:49.029938: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0: N
2019-05-14 08:03:49.030490: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created
TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with
10796 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus
id: 0000:42:00.0, compute capability: 6.1) 2019-05-14 08:03:57.812557:
I tensorflow/stream_executor/platform/default/dso_loader.cc:42]
Successfully opened dynamic library libcublas.so.10.0 2019-05-14
08:03:58.069834: I
tensorflow/stream_executor/platform/default/dso_loader.cc:42]
Successfully opened dynamic library libcudnn.so.7 2019-05-14
08:03:58.730756: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328]
Loaded runtime CuDNN library: 7.3.1 but source was compiled with:
7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a
binary install, upgrade your CuDNN library. If building from sources,
make sure the library loaded at runtime is compatible with the version
specified during compile configuration. 2019-05-14 08:03:58.732680: E
tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Loaded runtime CuDNN
library: 7.3.1 but source was compiled with: 7.4.2. CuDNN library
major and minor version needs to match or have higher minor version in
case of CuDNN 7.0 or later version. If using a binary install, upgrade
your CuDNN library. If building from sources, make sure the library
loaded at runtime is compatible with the version specified during
compile configuration. 2019-05-14 08:03:58.733465: F
tensorflow/core/kernels/conv_grad_input_ops.cc:955] Check failed:
stream->parent()->GetConvolveBackwardDataAlgorithms(
conv_parameters.ShouldIncludeWinogradNonfusedAlgo(stream->parent()),
&algorithms)
This is the main message when kernel died
question: jupyter notebook restart with kernel died. with above message.
Need to fix it.

With multiple versions of CUDA installed, how can I make Tensorflow-GPU use a specific version of CUDA on Windows

I currently have two versions of CUDA installed on my computer: 9.0 and 10.0. I have some Python modules that require CUDA 9.0 and some that require 10.0. For example, the version of Tensorflow-GPU I use requires CUDA 10.0. When I try to start training, I get the following error message:
2019-05-23 10:59:35.911847: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-05-23 10:59:39.907756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:84:00.0
totalMemory: 15.90GiB freeMemory: 14.98GiB
2019-05-23 10:59:39.919434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
Traceback (most recent call last):
File "wider_faces_inference.py", line 137, in <module>
output_dict_array = run_inference_for_images(image_np_list, detection_graph)
File "wider_faces_inference.py", line 74, in run_inference_for_images
with tf.Session() as sess:
File "C:\ProgramData\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1551, in __init__
super(Session, self).__init__(target, graph, config=config)
File "C:\ProgramData\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 676, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
This I believe is because tensorflow is not looking for the right version of CUDA. I wonder how I can make tensorflow use the correct version of CUDA.
EDIT:
To add a bit more information:
The version of Tensorflow I installed was compiled against CUDA 10.0. I installed CUDA 10.0 and Tensorflow-GPU first, and tensorflow worked just fine. Then I installed CUDA 9.0, and after installation, tensorflow stopped working.
Each version of CUDA comes with a driver that you can choose to install; newer versions of NVidia drivers support older versions of CUDA, but the reverse isn't true. The driver that comes with CUDA 9.0 is not able to run CUDA 10.0 applications.
All that you need to do is install the latest NVidia driver (or generally, any NVidia driver that has been released since CUDA 10.0) in order to have support for CUDA 9.X and 10.0 applications. The path of least resistance might be to reinstall the driver that came with CUDA 10.0, but you should get the most recent driver regardless.

Tensorflow doesn't seem to see my gpu

I've tried tensorflow on both cuda 7.5 and 8.0, w/o cudnn (my GPU is old, cudnn doesn't support it).
When I execute device_lib.list_local_devices(), there is no gpu in the output. Theano sees my gpu, and works fine with it, and examples in /usr/share/cuda/samples work fine as well.
I installed tensorflow through pip install. Is my gpu too old for tf to support it? gtx 460
I came across this same issue in jupyter notebooks. This could be an easy fix.
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
You can check if it worked with:
tf.test.gpu_device_name()
Update 2020
It seems like tensorflow 2.0+ comes with gpu capabilities therefore
pip install tensorflow should be enough
Summary:
check if tensorflow sees your GPU (optional)
check if your videocard can work with tensorflow (optional)
find versions of CUDA Toolkit and cuDNN SDK, compatible with your tf version
install CUDA Toolkit
install cuDNN SDK
pip uninstall tensorflow; pip install tensorflow-gpu
check if tensorflow sees your GPU
* source - https://www.tensorflow.org/install/gpu
Detailed instruction:
check if tensorflow sees your GPU (optional)
from tensorflow.python.client import device_lib
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
print(get_available_devices())
# my output was => ['/device:CPU:0']
# good output must be => ['/device:CPU:0', '/device:GPU:0']
check if your card can work with tensorflow (optional)
my PC: GeForce GTX 1060 notebook (driver version - 419.35), windows 10, jupyter notebook
tensorflow needs Compute Capability 3.5 or higher. (https://www.tensorflow.org/install/gpu#hardware_requirements)
https://developer.nvidia.com/cuda-gpus
select "CUDA-Enabled GeForce Products"
result - "GeForce GTX 1060 Compute Capability = 6.1"
my card can work with tf!
find versions of CUDA Toolkit and cuDNN SDK, that you need
a) find your tf version
import sys
print (sys.version)
# 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
import tensorflow as tf
print(tf.__version__)
# my output was => 1.13.1
b) find right versions of CUDA Toolkit and cuDNN SDK for your tf version
https://www.tensorflow.org/install/source#linux
* it is written for linux, but worked in my case
see, that tensorflow_gpu-1.13.1 needs: CUDA Toolkit v10.0, cuDNN SDK v7.4
install CUDA Toolkit
a) install CUDA Toolkit 10.0
https://developer.nvidia.com/cuda-toolkit-archive
select: CUDA Toolkit 10.0 and download base installer (2 GB)
installation settings: select only CUDA
(my installation path was: D:\Programs\x64\Nvidia\Cuda_v_10_0\Development)
b) add environment variables:
system variables / path must have:
D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\bin
D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\libnvvp
D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\extras\CUPTI\libx64
D:\Programs\x64\Nvidia\Cuda_v_10_0\Development\include
install cuDNN SDK
a) download cuDNN SDK v7.4
https://developer.nvidia.com/rdp/cudnn-archive (needs registration, but it is simple)
select "Download cuDNN v7.4.2 (Dec 14, 2018), for CUDA 10.0"
b) add path to 'bin' folder into "environment variables / system variables / path":
D:\Programs\x64\Nvidia\cudnn_for_cuda_10_0\bin
pip uninstall tensorflow
pip install tensorflow-gpu
check if tensorflow sees your GPU
- restart your PC
- print(get_available_devices())
- # now this code should return => ['/device:CPU:0', '/device:GPU:0']
If you are using conda, you might have installed the cpu version of the tensorflow. Check package list (conda list) of the environment to see if this is the case . If so, remove the package by using conda remove tensorflow and install keras-gpu instead (conda install -c anaconda keras-gpu. This will install everything you need to run your machine learning codes in GPU. Cheers!
P.S. You should check first if you have installed the drivers correctly using nvidia-smi. By default, this is not in your PATH so you might as well need to add the folder to your path. The .exe file can be found at C:\Program Files\NVIDIA Corporation\NVSMI
When I look up your GPU, I see that it only supports CUDA Compute Capability 2.1. (Can be checked through https://developer.nvidia.com/cuda-gpus) Unfortunately, TensorFlow needs a GPU with minimum CUDA Compute Capability 3.0.
https://www.tensorflow.org/get_started/os_setup#optional_install_cuda_gpus_on_linux
You might see some logs from TensorFlow checking your GPU, but ultimately the library will avoid using an unsupported GPU.
The following worked for me, hp laptop. I have a Cuda Compute capability
(version) 3.0 compatible Nvidia card. Windows 7.
pip3.6.exe uninstall tensorflow-gpu
pip3.6.exe uninstall tensorflow-gpu
pip3.6.exe install tensorflow-gpu
I had a problem because I didn't specify the version of Tensorflow so my version was 2.11. After many hours I found that my problem is described in install guide:
Caution: TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin
Before that, I read most of the answers to this and similar questions. I followed #AndrewPt answer. I already had installed CUDA but updated the version just in case, installed cudNN, and restarted the computer.
The easiest solution for me was to downgrade to 2.10 (you can try different options mentioned in the install guide). I first uninstalled all of these packages (probably it's not necessary, but I didn't want to see how pip messed up versions at 2 am):
pip uninstall keras
pip uninstall tensorflow-io-gcs-filesystem
pip uninstall tensorflow-estimator
pip uninstall tensorflow
pip uninstall Keras-Preprocessing
pip uninstall tensorflow-intel
because I wanted only packages required for the old version, and I didn't do it for all required packages for 2.11 version. After that I installed tensorflow 2.10:
pip install tensorflow<2.11
and it worked.
I used this code to check if GPU is visible:
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
So as of 2022-04, the tensorflow package contains both CPU and GPU builds. To install a GPU build, search to see what's available:
λ conda search tensorflow
Loading channels: done
# Name Version Build Channel
tensorflow 0.12.1 py35_1 conda-forge
tensorflow 0.12.1 py35_2 conda-forge
tensorflow 1.0.0 py35_0 conda-forge
…
tensorflow 2.5.0 mkl_py39h1fa1df6_0 pkgs/main
tensorflow 2.6.0 eigen_py37h37bbdb1_0 pkgs/main
tensorflow 2.6.0 eigen_py38h63d3545_0 pkgs/main
tensorflow 2.6.0 eigen_py39h855417c_0 pkgs/main
tensorflow 2.6.0 gpu_py37h3e8f0e3_0 pkgs/main
tensorflow 2.6.0 gpu_py38hc0e8100_0 pkgs/main
tensorflow 2.6.0 gpu_py39he88c5ba_0 pkgs/main
tensorflow 2.6.0 mkl_py37h9623b36_0 pkgs/main
tensorflow 2.6.0 mkl_py38hdc16138_0 pkgs/main
tensorflow 2.6.0 mkl_py39h31650da_0 pkgs/main
You can see that there are builds of TF 2.6.0 that support Python 3.7, 3.8 and 3.9, and that are built for MKL (Intel CPU), Eigen, or GPU.
To narrow it down, you can use wildcards in the search. This will find any Tensorflow 2.x version that is built for GPU, for instance:
λ conda search tensorflow=2*=gpu*
Loading channels: done
# Name Version Build Channel
tensorflow 2.0.0 gpu_py36hfdd5754_0 pkgs/main
tensorflow 2.0.0 gpu_py37h57d29ca_0 pkgs/main
tensorflow 2.1.0 gpu_py36h3346743_0 pkgs/main
tensorflow 2.1.0 gpu_py37h7db9008_0 pkgs/main
tensorflow 2.5.0 gpu_py37h23de114_0 pkgs/main
tensorflow 2.5.0 gpu_py38h8e8c102_0 pkgs/main
tensorflow 2.5.0 gpu_py39h7dc34a2_0 pkgs/main
tensorflow 2.6.0 gpu_py37h3e8f0e3_0 pkgs/main
tensorflow 2.6.0 gpu_py38hc0e8100_0 pkgs/main
tensorflow 2.6.0 gpu_py39he88c5ba_0 pkgs/main
To install a specific version in an otherwise empty environment, you can use a command like:
λ conda activate tf
(tf) λ conda install tensorflow=2.6.0=gpu_py39he88c5ba_0
…
The following NEW packages will be INSTALLED:
_tflow_select pkgs/main/win-64::_tflow_select-2.1.0-gpu
…
cudatoolkit pkgs/main/win-64::cudatoolkit-11.3.1-h59b6b97_2
cudnn pkgs/main/win-64::cudnn-8.2.1-cuda11.3_0
…
tensorflow pkgs/main/win-64::tensorflow-2.6.0-gpu_py39he88c5ba_0
tensorflow-base pkgs/main/win-64::tensorflow-base-2.6.0-gpu_py39hb3da07e_0
…
As you can see, if you install a GPU build, it will automatically also install compatible cudatoolkit and cudnn packages. You don't need to manually check versions for compatibility, or manually download several gigabytes from Nvidia's website, or register as a developer, as it says in other answers or on the official website.
After installation, confirm that it worked and it sees the GPU by running:
λ python
Python 3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.6.0'
>>> tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Getting conda to install a GPU build and other packages you want to use is another story, however, because there are a lot of package incompatibilities for me. I think the best you can do is specify the installation criteria using wildcards and cross your fingers.
This tries to install any TF 2.x version that's built for GPU and that has dependencies compatible with Spyder and matplotlib's dependencies, for instance:
λ conda install tensorflow=2*=gpu* spyder matplotlib
For me, this ended up installing a two year old GPU version of tensorflow:
matplotlib pkgs/main/win-64::matplotlib-3.5.1-py37haa95532_1
spyder pkgs/main/win-64::spyder-5.1.5-py37haa95532_1
tensorflow pkgs/main/win-64::tensorflow-2.1.0-gpu_py37h7db9008_0
I had previously been using the tensorflow-gpu package, but that doesn't work anymore. conda typically grinds forever trying to find compatible packages to install, and even when it's installed, it doesn't actually install a gpu build of tensorflow or the CUDA dependencies:
λ conda list
…
cookiecutter 1.7.2 pyhd3eb1b0_0
cryptography 3.4.8 py38h71e12ea_0
cycler 0.11.0 pyhd3eb1b0_0
dataclasses 0.8 pyh6d0b6a4_7
…
tensorflow 2.3.0 mkl_py38h8557ec7_0
tensorflow-base 2.3.0 eigen_py38h75a453f_0
tensorflow-estimator 2.6.0 pyh7b7c402_0
tensorflow-gpu 2.3.0 he13fc11_0
I have had an issue where I needed the latest TensorFlow (2.8.0 at the time of writing) with GPU support running in a conda environment. The problem was that it was not available via conda. What I did was
conda install cudatoolkit==11.2
pip install tensorflow-gpu==2.8.0
Although I've cheched that the cuda toolkit version was compatible with the tensorflow version, it was still returning an error, where libcudart.so.11.0 was not found. As a result, GPUs were not visible. The remedy was to set environmental variable LD_LIBRARY_PATH to point to your anaconda3/envs/<your_tensorflow_environment>/lib with this command
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/<user>/anaconda3/envs/<your_tensorflow_environment>/lib
Unless you make it permanent, you will need to create this variable every time you start a terminal prior to a session (jupyter notebook). It can be conveniently automated by following this procedure from conda's official website.
In my case, I had a working tensorflow-gpu version 1.14 but suddenly it stopped working. I fixed the problem using:
pip uninstall tensorflow-gpu==1.14
pip install tensorflow-gpu==1.14
I experienced the same problem on my Windows OS. I followed tensorflow's instructions on installing CUDA, cudnn, etc., and tried the suggestions in the answers above - with no success.
What solved my issue was to update my GPU drivers. You can update them via:
Pressing windows-button + r
Entering devmgmt.msc
Right-Clicking on "Display adapters" and clicking on the "Properties" option
Going to the "Driver" tab and selecting "Updating Driver".
Finally, click on "Search automatically for updated driver software"
Restart your machine and run the following check again:
from tensorflow.python.client import device_lib
local_device_protos = device_lib.list_local_devices()
[x.name for x in local_device_protos]
Sample output:
2022-01-17 13:41:10.557751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
2022-01-17 13:41:10.558125: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2022-01-17 13:41:10.562095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2022-01-17 13:45:11.392814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-01-17 13:45:11.393617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2022-01-17 13:45:11.393739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2022-01-17 13:45:11.401271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 1391 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
>>> [x.name for x in local_device_protos]
['/device:CPU:0', '/device:GPU:0']