What is the reason that TensorFlow does not detect GPU on Windows - tensorflow

I have installed CUDA 9.0 on my machine which has the NVIDIA GTX 1080 graphics cards. When I run the command nvcc --version then I get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:32_Central_Daylight_Time_2017
Cuda compilation tools, release 9.0, V9.0.176
But I have tried the steps from the TensorFlow official site to install TF with GPU support, but it still using the CPU.
I have tried pip install and Anaconda install, all was the same result. No one was able to detect GPU, then I have tried many other tutorials on the web, which they were able to detect the GPU, but I am not.
What can be the reason, is there any changing in the new GPU version of TF? If yes, then what is the latest documentation to install TF with GPU support, if not, then where I am doing wrong.
Thanks!
Update1: Tensorflow really wastes my time. Very annoying, at the first I decided to build TF from source, to use it with CUDA 10, but on both OS Windows 10 and Ubuntu 18.04 I was unable to build it successfully. So I gave up, then I decided to use with CUDA 9.0, which is not supported in Ubuntu 18.04, so I came back to windows, but even still the prebuilt library of TF not working, really annoying.
I don't know why TF still using CUDA 9.0 which CUDA 10.0 already officially released, and TF still not supporting Python 3.7? amazing not? and the same thing with MS Build Tools 2015, which 2017 already exist, and many more tools. TF relays on old versions of the tools which make a lot of problem for some people that they must uninstall their new versions which still using, it is very annoying...
Update2: nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 417.71 Driver Version: 417.71 CUDA Version: 9.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 WDDM | 00000000:01:00.0 On | N/A |
| 27% 35C P8 8W / 180W | 498MiB / 8192MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1264 C+G Insufficient Permissions N/A |
| 0 2148 C+G ...0108.0_x64__8wekyb3d8bbwe\HxOutlook.exe N/A |
| 0 4360 C+G ...mmersiveControlPanel\SystemSettings.exe N/A |
| 0 7332 C+G C:\Windows\explorer.exe N/A |
| 0 7384 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 8488 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 9704 C+G ...osoft.LockApp_cw5n1h2txyewy\LockApp.exe N/A |
| 0 10588 C+G ...al\Google\Chrome\Application\chrome.exe N/A |
| 0 10904 C+G ...x64__8wekyb3d8bbwe\Microsoft.Photos.exe N/A |
| 0 12608 C+G ...DIA GeForce Experience\NVIDIA Share.exe N/A |
| 0 13000 C+G ...241.0_x64__8wekyb3d8bbwe\Calculator.exe N/A |
| 0 14668 C+G ...ng4wbp0\app\DellMobileConnectClient.exe N/A |
| 0 17628 C+G ...2.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A |
| 0 18060 C+G ...oftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A |
+-----------------------------------------------------------------------------+

I finally figured out the problem. this may help others It is a bug with TF 1.12, I have removed and reinstalled TF 1.11 which it is able to detect GPU.
Some suggestions to TF team:
Before releasing a new version, please make sure that it is working
in all OS systems without any bugs (the bugs like this which I
against is really a very low lever bug)
Please also refresh your third-party libraries to support the newest
versions, e.g: CUDA 10, which I had installed in my machine, but because of installing TF I stepped back to 9.0; annoying. VS 2015,
Python 3.7, and and and ... as well.
Please update the documentation, for both install and building from
source, describe everything clearly, what needs, what to install, how to build all of the need tools and utils must be described clearly. In the documentation, I found that building TF from source is
very very easy, but in reality was not, there I found a lot of errors
like the others, so I was unable to build from source.
Till now I found the TF the most annoying framework, building and installing. TF is very sensitive the happening of errors probability in both building or installing is very high.
Good Luck!!

Related

Stuck with enabling GPUs for Tensorflow in WSL2 under Windows 10

I can't get Tensorflow 2 to use my GPUs under WSL2. I am aware of this question, but GPU support is now (supposedly) no longer experimental.
Windows is on the required 21H2 version, which should support the WSL2 GPU connection.
Windows 10 Pro, 21H2, build 19044.1706
The PC has two GPUs:
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-19c8549a-4b8d-5d70-456b-776ceece4b0f)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-2a946756-0472-fb90-f1a4-b40cce1bba4f)
I had installed Ubuntu under WSL2 some time ago:
PS C:\Users\jem-m> wsl --status
Default Distribution: Ubuntu-20.04
Default Version: 2
...
Kernel version: 5.10.16
In the Windows PowerShell, I can run nvidia-smi.exe, which gives me
PS C:\Users\jem-m> nvidia-smi.exe
Mon May 16 18:13:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:08:00.0 On | N/A |
| 23% 31C P8 10W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... WDDM | 00000000:41:00.0 Off | N/A |
| 23% 31C P8 12W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
while the nvidia-smi in the WSL2 Ubuntu shell gives
(testenv) jem-mosig:~/ $ nvidia-smi [17:48:30]
Mon May 16 17:49:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:08:00.0 On | N/A |
| 23% 34C P8 10W / 250W | 784MiB / 11264MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 23% 34C P8 13W / 250W | 784MiB / 11264MiB | 12% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Note the same driver and CUDA version, but different NVIDIA-SMI version.
This seems to indicate that CUDA works under WSL2 as it is supposed to. But when I run
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
# 2022-05-17 12:13:05.016328: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
# []
in python inside WSL2 I get [], so no GPU is recognized by Tensorflow. This is Python 3.8.0 and Tensorflow 2.4.1 freshly installed in a new Miniconda environment inside Ubuntu WSL2. I don't know what is going wrong. Any suggestions?
Addendum
I don't get any error messages when importing Tensorflow. But some warnings are produced when working with it. E.g., when I run
import tensorflow as tf
print(tf.__version__)
model = tf.keras.Sequential([tf.keras.layers.Dense(3)])
model.compile(loss="mse")
print(model.predict([[0.]]))
I get
2.4.1
2022-05-17 10:38:28.792209: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 10:38:28.792411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-17 10:38:28.794356: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-05-17 10:38:28.853557: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 10:38:28.860126: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792975000 Hz
[[0. 0. 0.]]
These don't seem to be GPU related, though.
Dr. Snoopy got me onto the right track: Despite the fact that the TF website says that
The TensorFlow pip package includes GPU support for CUDA®-enabled cards
, I still needed to run conda install tensorflow-gpu and it worked! Now
import tensorflow as tf
from tensorflow.python.client import device_lib
print("devices: ", [d.name for d in device_lib.list_local_devices()])
print("GPUs: ", tf.config.list_physical_devices('GPU'))
print("TF v.: ", tf.__version__)
gives lots of debug messages and
devices: ['/device:CPU:0', '/device:GPU:0', '/device:GPU:1']
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
TF v.: 2.4.1

Tensorflow is not detecting my GPUs. What shall I do (May 2021)?

TF Version : 2.4.1
CUDA Version : 11.1
tf.test_is_gpu_available() -- returns --> FALSE
tf.test.is_built_with_cuda() -- returns --> TRUE
I tried to revert back TF to 2.4.0, but didn't work
I have also tried:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
But nothing seems to work, TF is just not detecting my GPUs
EDIT 1:
Output of nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
Output of nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 23W / 300W | 23MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:43:00.0 Off | N/A |
| 30% 40C P8 27W / 300W | 5MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 3090 Off | 00000000:81:00.0 Off | N/A |
| 64% 63C P2 179W / 300W | 24043MiB / 24268MiB | 59% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2362 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 2564 G /usr/bin/gnome-shell 12MiB |
| 1 N/A N/A 2362 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2362 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 14304 C python3 24035MiB |
+-----------------------------------------------------------------------------+
While running tf.test.is_gpu_avaliable(), I get the following warning:
WARNING:tensorflow:From Spell_correction.py:35: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-05-07 21:46:21.855460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-07 21:46:21.856690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:43:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-05-07 21:46:21.856716: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-07 21:46:21.856735: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-07 21:46:21.856747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-07 21:46:21.856759: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-07 21:46:21.856771: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-07 21:46:21.856829: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.1/lib64
2021-05-07 21:46:21.856846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-07 21:46:21.856856: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-07 21:46:21.856863: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-07 21:46:21.942589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-07 21:46:21.942626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-07 21:46:21.942633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
Another Observation:
Pytorch is detecting GPU, while TF is not.
torch.cuda.is_available() --> TRUE
tf.test.is_gpu_available() --> FALSE
if you use ubuntu 20.04, I suggest to follow steps from here. I recently had the same problem.
You have
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 23W / 300W | 23MiB / 24268MiB | 0% Default |
| | | N/A
try to get latest version of NVIDIA 465 and Cuda 11.3. For my case nvidia-smi is as below:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
What I did;
(1) I uninstalled NVIDIA and CUDA completely see here and be careful.
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get install ubuntu-desktop
sudo rm /etc/X11/xorg.conf
echo 'nouveau' | sudo tee -a /etc/modules
(2) I downloaded NVIDIA, download .run file and simply run sudo bash NVIDIA*.run
(3) I downloaded cuDNN and perform following as described here
tar -xzvf cudnn-11.3-.*.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Also check .bashrc files as well as described here:
cd ~
gedit .bashrc or nano .bashrc
#add this in the end :
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda11.3/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Then, pip install tensorflow-gpu==2.4.1

Nvidia GeForce 210 compute issue on Ubuntu 18.04

I am using ubuntu 18.04 (I have dual booted windows with ubuntu 18.04).
nvidia-smi
This is the output I got when I ran the above command on my ubuntu(18.04) terminal:
Fri Oct 9 09:33:56 2020
+------------------------------------------------------+
| NVIDIA-SMI 340.108 Driver Version: 340.108 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 210 Off | 0000:01:00.0 N/A | N/A |
| 35% 52C P8 N/A / N/A | 368MiB / 1023MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Before that, I followed these steps to install required driver on my system:
sudo add-apt-repository --remove ppa:graphics-drivers/ppa
sudo apt-get purge nvidia*
sudo apt autoremove
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo shutdown -r now
When I tried to run Geekbench5 compute benchmark test, the output stopped when it was running Histogram Equalization. This is the output when I ran this ./geekbench5 --compute OpenCL in the folder where I extracted geekbench5:
[1009/092949:FATAL:src/halogen/cuda/cuda_library.cpp(1481)] Failed to load
cuDevicePrimaryCtxRetain: /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxRetain
[1009/092949:FATAL:src/halogen/cuda/cuda_library.cpp(1481)] Failed to load cuDevicePrimaryCtxRetain: /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxRetain
Geekbench 5.2.4 Tryout : https://www.geekbench.com/
Geekbench 5 is in tryout mode.
Geekbench 5 requires an active Internet connection when in tryout mode, and
automatically uploads test results to the Geekbench Browser. Other features
are unavailable in tryout mode.
Buy a Geekbench 5 license to enable offline use and remove the limitations of
tryout mode.
If you would like to purchase Geekbench you can do so online:
https://store.primatelabs.com/v5
If you have already purchased Geekbench, enter your email address and license
key from your email receipt with the following command line:
./geekbench5 -r <email address> <license key>
Running Gathering system information
System Information
Operating System Ubuntu 18.04.5 LTS 4.15.0-118-generic x86_64
Model To be filled by O.E.M. To be filled by O.E.M.
Motherboard O.E.M Intel H81
BIOS American Megatrends Inc. 4.6.5
Processor Information
Name Intel Core i5-4460
Topology 1 Processor, 4 Cores
Identifier GenuineIntel Family 6 Model 60 Stepping 3
Base Frequency 3.20 GHz
L1 Instruction Cache 32.0 KB x 2
L1 Data Cache 32.0 KB x 2
L2 Cache 256 KB x 2
L3 Cache 6.00 MB
Memory Information
Size 7.75 GB
OpenCL Information
Platform Vendor NVIDIA Corporation
Platform Name NVIDIA CUDA
Device Vendor NVIDIA Corporation
Device Name GeForce 210
Device Driver Version 340.108
Maximum Frequency 1.23 GHz
Compute Units 2
Device Memory 1024 MB
OpenCL
Running Sobel
Running Canny
Running Stereo Matching
Running Histogram Equalization
[1009/093329:ERROR:src/interface/console/consolemain.cpp(808)] Geekbench encountered an internal error and cannot continue. Please contact support#primatelabs.com for assistance.
Internal error message: clCreateImage returned -40.
Also, when I tried running the geekbench5 compute benchmark test on windows 10(same machine, on GUI), it paused running at Histogram equalization.
I am not getting any idea why this is happening.Is anything really wrong with my GPU or driver or anything else? I tried to search online, installed the driver again,rebooted the system, but the results are same. Can someone please help?
Your driver installation is fine, but your GPU is 11 years old and does not support some of the more recent features of the OpenCL standard. The geekbench error message -40 means that the image size geekbench uses for one of its benchmarks is not supported by your GPU. This causes the benchmark to crash. Maybe an older version of geekbench still works.

What is the correct version of CUDNN for CUDA 11.0

I want to start using tensorflow-gpu, and I looked some stuff up, and found out that I need to ensure that I have both CUDA and CUDNN. So, I opened up the command prompt and ran the command nvidia-smi to check my CUDA version:
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi
Tue Jun 02 14:13:03 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 445.87 Driver Version: 445.87 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 40C P8 N/A / N/A | 77MiB / 4096MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU PID Type Process name GPU Memory |
| Usage |
|=============================================================================|
| 0 10488 C+G ...n64\EpicGamesLauncher.exe N/A |
| 0 12636 C+G ...4\UnrealCEFSubProcess.exe N/A |
+-----------------------------------------------------------------------------+
Now that I see my CUDA version is 11.0, I went to the NVidia's website to select a version of CUDNN that can work with CUDA 11.0, but the latest ones support up to CUDA 10.2 currently. What should I do? Can I use the one for CUDA 10.2?
What nvidia-smi shows is not the CUDA version that you have installed, but the maximum CUDA version that your driver supports.
CUDA 11.0 has been announced but not released yet (as of June 2nd 2020), so you should use CUDA 10.2 as it's the latest available version.
A couple of weeks ago, I have upgraded three of them to the new cuda_11.0.2, Driver 450.51.06 and cuDNN _8.0.
My environment:
86-64
Centos 7 with gcc 4.8.5 ( sudo doesn't work in Centos. Login as root)
I downloaded cuda_11.0.2-450.51.05_linux.run
I took a risk but it went fine. On Nvidia cudnn matrix it said:
Compute > 3.5, toolkit =11.0 , and driver r450
So the driver and toolkit minors doesn't matter.
Installed, and went through pre-, post- and recommended.
Everything went fine.
*This is very important
My cudnn installed but couldn't run the examples.
If you are an Engineer, you have went through such dilemma because you bypass some small details.
Gcc 4.8.5 if for installing toolkit and driver.
Cudnn 8.0 needs gcc 5 and above for c++ 11 or 14 for tool chain.
So what I have done is that( I have a lot of. devtoolset versions in my environment).
I choose 6.0 version instead of 5 to make not be on the border line.
Re-install it, you will be cool.
***Regarding tensor-flow×××: It has nothing to do with cudnn other than kera for python if I get this right.

libtensorflow_framework.so: undefined symbol: cuDevicePrimaryCtxGetState

I have installed tensorflow-gpu with conda successfully. When I test doing import tensorflow I have the issue mentioned above. Any idea? I have checked my drivers, the nvidia toolkit and cudnn are intalled correctly and set the values of PATH, LD_LIBRARY_PATH and CUDA_HOME respectively.
...Fri Nov 23 12:00:18 2018
+------------------------------------------------------+
| NVIDIA-SMI 340.107 Driver Version: 340.107 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro FX 5600 Off | 0000:02:00.0 N/A | N/A |
| 61% 77C P0 N/A / N/A | 2MiB / 1535MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
You need to have the proper minimum CUDA support (seems like it's 7 https://askubuntu.com/questions/988787/nvidia-cuda-theano-could-not-find-symbol-cudeviceprimaryctxgetstate) with cuDNN 3.
Upgrade your drivers if possible to get this version.
Otherwise, use either tensorflow-mkl or for older CPU models tensorflow-eigen.
I had the same problem - I had already installed the required NVIDIA drivers on my Ubuntu machine, but when I tried to import Tensorflow I got the same error.
So for me the problem was that I was using the Nouveau driver instead of the NVIDIA driver. In order to fix the issue you need to go to System Settings->Software & Updates->Additional Drivers and select the option Using NVIDIA binary driver ... and then click at the Apply Changes button. Then just reboot and you're done.