Stable Baselines 3 DQN Model refuses to use CUDA even though it recognizes my GPU - tensorflow

as the title states. My DQN Model refuses to use the GPU for a custom environment with Stable Baselines 3.
model = DQN("MlpPolicy", env, device="cuda")
My GPU is an RTX 2070 Super
Installed CUDA Version is 10.1
Installed cudNN Version is 7.5.0 (for 10.1)
Installed Tensorflow Version is 2.2
CUDA works when I use tensorflow for machine learning on its own but seems to not work with Stable Baselines 3. It always defaults back to CPU but when I let it print my available CUDA devices right before creating the model it shows that there is one available, which refers to my RTX 2070 Super
All plugins are loaded properly when tensorflow initializes and there are no errors.
Unfortunately I am unable to find anything regarding this issue.
I have tried a multitude of different CUDA Versions along with changing my tensorflow versions and installing the specific tensorflow-gpu package but nothing has changed.
here is the log when running my code
2023-02-01 16:28:20.409397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2023-02-01 16:28:21.691033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2023-02-01 16:28:21.713878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-02-01 16:28:21.714097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Num GPUs Available: 1
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
2023-02-01 16:28:21.717235: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2023-02-01 16:28:21.719513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2023-02-01 16:28:21.720370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2023-02-01 16:28:21.723773: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2023-02-01 16:28:21.725979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2023-02-01 16:28:21.732353: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2023-02-01 16:28:21.732490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0

Related

Unable to get tensorflow to recognize my GPU on Windows 10

I had previously installed the tensorflow-gpu 2.5.0 package using conda on my machine in 2 envrionments. In a python 3.7 environment, everything worked well. In a python 3.8 environment, I was having issues, but I found that if I loaded certain libraries first, I was able to get tensorflow to recognize my GPU.
Unfortunately, I ended up with some issues in those environments, and I had to remove them.
Now, no matter how I try, I cannot get tensorflow to load cudart64_110.dll and, therefore, recognize my GPU. I have tried installing using the tensorflow-gpu 2.5.0 package using conda, and I have tried following the instructions on the official tensorflow.org page, but in neither case will tensorflow load the cudart64_110.dll library.
However, I also have spacy installed in my environment, and if I import spacy, I get the following messages:
2022-09-16 13:42:58.588234: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-09-16 13:42:58.588417: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-09-16 13:43:00.551253: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2022-09-16 13:43:00.555761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.425GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2022-09-16 13:43:00.556393: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-09-16 13:43:00.556491: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2022-09-16 13:43:00.556787: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2022-09-16 13:43:00.557350: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2022-09-16 13:43:00.557720: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2022-09-16 13:43:00.558789: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2022-09-16 13:43:00.559043: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2022-09-16 13:43:00.559383: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2022-09-16 13:43:00.559746: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
So, clearly, some of the CUDA libraries are being opened correctly, but not all.
I've tried uninstalling miniconda & reinstalling it, but I get the same issues.
I'm at a loss as to why tensorflow continues to give me these problems.

Exact CUDA / cudNN versions for TensorFlow 2.4.0

I'm trying to use Anaconda3 (2020.11 with Python 3.8.5 64-bit) with Tensorflow 2.4.0 on Windows 10 but I must say this technology seems to be still very... unstable!
It's really puzzling to understand that each library depends on an exact version of another library, not more, not less!
So far I managed to install:
Anaconda3 (2020.11 with Python 3.8.5 64-bit)
tensorflow 2.4.0
CUDA 11.0.2, runtime only, using network installer
cudnn-11.0-windows-x64-v8.0.4.30
GeForce drivers 461.09-desktop-win10-64bit-international-dch-whql
board is a Geforce RTX 3070
Which according to the manual https://www.tensorflow.org/install/gpu should be OK but unfortunately I'm still getting the dreaded "Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows" message.
Here's the complete trace:
2021-01-20 20:53:25.785203: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-20 20:53:29.173495: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-20 20:53:29.175299: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-01-20 20:53:29.213308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.755GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-20 20:53:29.213536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-20 20:53:29.237764: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 20:53:29.237865: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 20:53:29.244635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-20 20:53:29.247913: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-20 20:53:29.262791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-20 20:53:29.268091: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-20 20:53:29.278049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 20:53:29.278203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-20 20:53:29.279054: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-20 20:53:29.281144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.755GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-20 20:53:29.281321: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-20 20:53:29.281786: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 20:53:29.282156: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 20:53:29.282961: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-20 20:53:29.283385: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-20 20:53:29.284167: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-20 20:53:29.284635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-20 20:53:29.286872: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 20:53:29.289197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-20 20:53:29.772262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-20 20:53:29.772375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-01-20 20:53:29.773599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-01-20 20:53:29.774277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6589 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-01-20 20:53:29.775166: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-20 20:53:30.414473: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-01-20 20:53:31.860756: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 20:53:32.450199: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 20:53:32.476605: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 20:53:33.172408: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-20 20:53:33.172484: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
Judging from the documentation this might be related to using the wrong combination of libraries but really I don't have a clue: any test I might be doing to troubleshoot this?
I think the problem is the GeForce drivers 461.09-desktop-win10-64bit-international-dch-whql. It includes CUDA 11.2, not 11. I guess you need to find a GeForce driver version compatible with your card that includes CUDA 11. It looks like version 450.36.06+ could work for you.
I'd recommend you uninstall CUDA 11.2 and the current drivers from your computer and install the older versions.

Tensorflow 2.2 taking a long time to start

I am trying to run tensorflow on windows 10 with the following setup:
Anaconda3 with
python 3.8
tensorflow 2.2.0
GPU: RTX3090
cuda_10.1.243
cudnn-v7.6.5.32 for windows10-x64
Running the following code takes between 5 ~ 10 minutes to print the output.
import tensorflow as tf
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
I get the following output immediately, but then it hangs for few minutes before proceeding.
1-17 04:03:00.039069: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-11-17 04:03:00.042677: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-11-17 04:03:00.045041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-11-17 04:03:00.045775: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-11-17 04:03:00.049246: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-11-17 04:03:00.050633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-11-17 04:03:00.056731: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-11-17 04:03:00.056821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
Running the smae code on colab takes only a second.
Any suggestions?
Thanks
I don't understand why Mux's answer is downvoted, as he is right. Nvidia Ampere can't run optimally on CUDA versions < 11.1, as Ampere streaming multiprocessor (SM_86) are only supported on CUDA 11.1, see https://forums.developer.nvidia.com/t/can-rtx-3080-support-cuda-10-1/155849/2
However, the direct solution to your issue without updating CUDA could possibly be achieved by increasing default JIT cache size with 'export CUDA_CACHE_MAXSIZE=2147483648', by setting that environment variable to 2147483648 (4GB). You will still have this long wait on first start up thought, see https://www.tensorflow.org/install/gpu#hardware_requirements
RTX3090 has Amper Architecture which requires Cuda 11+.
Checkout this guide:
https://medium.com/#dun.chwong/the-simple-guide-deep-learning-with-rtx-3090-cuda-cudnn-tensorflow-keras-pytorch-e88a2a8249bc
The reason is as Mux says.
Background:
See https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/ for full explanation.
The first stage compiles source device code to PTX virtual assembly, and the second stage compiles the PTX to binary code for the target architecture. The CUDA driver can execute the second stage compilation at run time, compiling the PTX virtual assembly “Just In Time” to run it.
So for old version software package with new hardware, that is binary code for target architecture is not precompiled, it fallbacks to PTX virtual assembly and trigger runtime JIT compile for the new target architecture. That mean CUDNN and CUBLAS kernels and tensorflow built-in kernels are all JIT compiled at startup, which incurs loooooog startup time in your case.
That is why Dan Pavlov suggests enables JIT caching, that is, you only JIT compile once, not JIT compile from time to time on startup.

Could not load dynamic library 'cudart64_101.dll'

I recently installed TensorFlow (2.3.1) with CUDA 11.1.0 cuDNN 8.0.4 In many forums, they said cuDNN 11.1 is backwards compatible with the previous versions and I also set the PATH variable as mentioned in TensorFlow installation guide, yet I still get the warning
2020-10-05 13:55:42.704300: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-10-05 13:55:42.706817: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
But I do have Nvidia GeForce RTX 2070 Super card. How to fix this issue?
And I am using python 3.8
Thanks in advance!
According to Tensorflow tested build configurations,CUDA 11.0 can be used for TF 2.4.0
For TF 2.3.0, compatible CUDA is 10.1, cuDNN is 7.6.
For more details please refer here.

GPU problem for CUDA 11.0 and cuDNN 8.0.2

I have the CUDA 11.0 and cuDNN 8.0.2, which are the recommended setup
I have tensorflow-gpu 2.3 and keras 2.4
However the GPUs are not used and I don't know why.
by giving the following command lines
sess = tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
print("GPU available? ", sess)
built = tf.test.is_built_with_cuda()
print("tf is built with CUDA? ", built)
gpus = tf.config.list_physical_devices('GPU')
cpus = tf.config.list_physical_devices('CPU')
print("Num GPUs used: ", len(gpus))
print("Num CPUs used: ", len(cpus))
print(tf.sysconfig.get_build_info())
The output is the following:
GPU available? False
tf is built with CUDA? True
Num GPUs used: 0
Num CPUs used: 1
{'cuda_version': '10.1', 'cudnn_version': '7', 'cuda_compute_capabilities': ['sm_35', 'sm_37', 'sm_52', 'sm_60', 'sm_61', 'compute_70'], 'cpu_compiler': '/usr/bin/gcc-5', 'is_rocm_build': False, 'is_cuda_build': True}
it comes with the following error:
W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
As stated in Tensorflow documentation. The software requirements are as follows.
Nvidia gpu drivers - 418.x or higher
Cuda - 10.1 (TensorFlow >= 2.1.0)
cuDNN - 7.6
See Link
You must have a python version between 3.5 - 3.8.
Along with that you need Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019.
You can download that here. Link
Check the full system requirements here Link
Don't forget to add cuda and cudnn in your system path. See Link
This solution clears the issue completely in my case even my system is set up with CUDA 10.2. Tensorflow probably requires something from CUDA 10.1, I guess.
conda install cudatoolkit=10.1
Check https://github.com/tensorflow/tensorflow/issues/38578#issuecomment-710104168