I was following one of the online tutorials but I was getting this error:
Traceback (most recent call last):
File “”, line 20, in
detections = net.forward()
cv2.error: OpenCV(4.3.0) /home/blah/opencv/modules/dnn/src/layers/…/cuda4dnn/primitives/…/csl/cudnn/convolution.hpp:461: error: (-217:Gpu API call) CUDNN_STATUS_EXECUTION_FAILED in function ‘convolve_with_bias_activation’
It's a python script and I use Opencv dnn module with a pre-trained model
This is my configuration:
Jetson Nano device
Ubuntu 18.04
/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
– NVIDIA GPU arch: 53
– NVIDIA PTX archs:
– cuDNN: YES (ver 8.0)
– NVIDIA GPU arch: 53
– cuDNN: YES (ver 8.0)
opencv 4.3.0 built from source with OPENCV_DNN_CUDA=ON, CUDNN_VERSION=‘8.0’, WITH_CUDA=ON, WITH_CUDNN=ON, and many other settings enabled
Python 3.7.7
This is a snippet of the code I am trying to run (it completes successfully if I don’t use the GPU). It fails at the line detections = net.forward()
CLASSES = [“background”, “aeroplane”]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
print("[INFO] setting preferable backend and target to CUDA…")
print("[INFO] accessing video stream…")
vs = cv2.VideoCapture(args[“input”] if args[“input”] else 0)
writer = None
fps = FPS().start()
while True:
(grabbed, frame) =
frame = imutils.resize(frame, width=400)
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):


GPU not available with tensorflow ad Torch in Ubuntu 18.04

I have a trouble with my GPU since it is not available in Tensorflow neither in Torch.
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
print('The device name is: ',device_name)
--> print : The device name is: '' # the device_name returned is empty
print("GPU available")
Here is the version CUDA
$ nvcc --version
--> nvcc: NVIDIA (R) Cuda compiler driver
--> Copyright (c) 2005-2022 NVIDIA Corporation
--> Built on Tue_May__3_18:49:52_PDT_2022
--> Cuda compilation tools, release 11.7, V11.7.64
--> Build cuda_11.7.r11.7/compiler.31294372_0
The version of NVIDIA
--> NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
the version of tensorflow-gpu
import tensorflow as tf
--> 2.0.0
I read that CUDA version 11.7 can match with NVIDIA 515.65.01. Any idea please ?

ONNX converted TensorFlow saved model runs on CPU but not on GPU

I converted a tensorflow saved model to ONNX format using tf2onnx :
python3 -m tf2onnx.convert --saved-model saved_model/ --output onnx/model.onnx --opset 11
The conversion worked fine and I can run inference with the ONNX model using CPU.
I installed onnxruntime-gpu to run inference with GPU and encountered an error :
RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Relu node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Relu' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/ bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/ bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 2: out of memory ; GPU=0 ; hostname=coincoin; expr=cudaMalloc((void**)&p, size);
I am the only one using the GPU which is a Titan RTX (24GB of RAM). The model runs fine on GPU using its tensorflow saved model version, with 10GB of the GPU's RAM.
Versions are :
tensorflow 1.14.0
CUDA 10.0
CuDNN 7.6.5
onnx 1.6.0
onnxruntime 1.1.0
tf2onnx 1.9.2
python 3.6
Ubuntu 18.04
according to your information and Versions, maybe two solutions to solve the problem:
the information is out of memory, please check the gpu memory is available yet, if not available, set the gpu memory:
config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0"
config.gpu_options.per_process_gpu_memory_fraction = 0.1
downgrade onnxruntime-gpu, according to your Versions, CUDA 10.0 matchs with onnxruntime-gpu==1.0.0, not 1.1.0, which needs cuda 11

Tensorflow 2.0 can't use GPU, something wrong in cuDNN? :Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

I am trying to understand and debug my code. I try to predict with a CNN model developed under tf2.0/tf.keras on GPU, but get those error messages.
could someone help me to fix it?
here is my environmental configuration
python 3.6.8
tensorflow-gpu 2.0.0-rc0
nvidia 418.x
CUDA 10.0
cuDNN 7.6+**
and the log file,
2019-09-28 13:10:59.833892: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2019-09-28 13:11:00.228025: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2019-09-28 13:11:00.957534: E tensorflow/stream_executor/cuda/] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963310: E tensorflow/stream_executor/cuda/] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963416: W tensorflow/core/common_runtime/] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node mobilenetv2_1.00_192/Conv1/Conv2D}}]]
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0=====>GPU Available: True
=====> 4 Physical GPUs, 1 Logical GPUs
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "", line 162, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/", line 915, in predict
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/", line 722, in predict
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/", line 393, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/", line 3625, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/", line 511, in call
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node mobilenetv2_1.00_192/Conv1/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ ]] [Op:__inference_keras_scratch_graph_10727]
Function call stack:
The code
if __name__ == "__main__":
print("=====>GPU Available: ", tf.test.is_gpu_available())
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Currently, memory growth needs to be the same across GPUs
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("=====>", len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
paras_path = "./paras/{}".format(int(2011))
model = tf.keras.experimental.load_from_saved_model(paras_path)
initial_tensor = np.zeros((1, INPUT_SHAPE, INPUT_SHAPE, 3))
You have to check that you have the right version of CUDA + CUDNN + TensorFlow (also ensure that you have all installed).
A couple of examples of running configurations are presented below(UPDATE FOR LATEST VERSIONS OF TENSORFLOW)
Cuda 11.3.1 + CuDNN + TensorFlow 2.7.0
Cuda 11.0 + CuDNN 8.0.4 + TensorFlow 2.4.0
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.2.0/TensorFlow 2.3.0 (TF >= 2.1 requires CUDA >=10.1)
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.1.0 (TF >= 2.1 requires CUDA >=
Cuda 10.0 + CuDNN 7.6.3 + / TensorFlow 1.13/1.14 / TensorFlow 2.0.
Cuda 9.0 + CuDNN 7.0.5 + TensorFlow 1.10
Usually this error appears when you have an incompatible version of TensorFlow/CuDNN installed. In my case, this appeared when I tried using an older TensorFlow with a newer version of CuDNN.
**If for some reason you get an error message like(and nothing happens afterwards) :
Relying on the driver to perform ptx compilation
Solution : Install the latest nvidia driver
[SEEMS TO BE SOLVED IN TF >= 2.5.0] (see below):
Only for Windows Users : Some late combintations of CUDA, CUDNN and TF may not work, due to a bug (a .dll extension named improperly). To handle that specific case, please consult this link: Tensorflow GPU Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
For those who are facing issues regarding the above error(For Windows platform), I sorted it just by installing CuDNN version compatible with the CUDA already installed in the system.
This suitable version can be downloaded from the website Download CuDNN from Developer's portal. You might need Nvidia account for it. This will be easily created by providing mail id and filling a questionnaire.
To check the CUDA version, run NVCC --version.
Once the suitable version is downloaded, extract the folder from the zip file.
Go to the bin folder of the extracted folder. copy the cudnn64:7.dll and paste it in the CUDA's bin folder. In my case, the location where Cuda is installed is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin.
This would most probably solve the problem.
My system details:
Windows 10
CUDA 10.0
TensorFlow 2.0
GPU- Nvidia GTX 1060
I also found this blog Installing TensorFlow with CUDA and GPU support on Windows 10. very useful.
Check the instructions on this TensorFlow GPU instruction page for your OS. It resolved issue for me on Ubuntu 16.04.6 LTS and Tensorflow 2.0

Tensorflow GPU /device:GPU:0 not found on Ubuntu Bionics

I have a p3.2xlarge template from Amazon EC2 with Ubuntu Bionics. It's supposed to have a GPU device installed . But when I run this code it says there is no GPU. Now, this is a virtual and not physica machine but there is still supposed to be a GPU. Note that I started TensorFlow using Docket, which should not work is the GPU was missing:
sudo docker run -it -p 8888:8888 tensorflow/tensorflow
with tf.device('/device:CPU:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (
I get this error:
InvalidArgumentError: Cannot assign a device for operation 'MatMul': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]
But there are NVIDIA drivers loaded, and you cannot load those without a device:
lspci -nn | grep '\[03'
00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8]
00:1e.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] [10de:1db1] (rev a1)
dpkg -l "*cuda*"
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
ii cuda-command-l 9.0.176-1 amd64 CUDA command-line tools
ii cuda-core-9-0 amd64 CUDA core tools
ii cuda-cublas-9- amd64 CUBLAS native runtime libraries
ii cuda-cudart-9- 9.0.176-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-de 9.0.176-1 amd64 CUDA Runtime native dev links, he
ii cuda-cufft-9-0 9.0.176-1 amd64 CUFFT native runtime libraries
ii cuda-curand-9- 9.0.176-1 amd64 CURAND native runtime libraries
ii cuda-cusolver- 9.0.176-1 amd64 CUDA solver native runtime librar
ii cuda-cusparse- 9.0.176-1 amd64 CUSPARSE native runtime libraries
ii cuda-driver-de 9.0.176-1 amd64 CUDA Driver native dev stub libra
ii cuda-license-9 9.0.176-1 amd64 CUDA licenses
ii cuda-misc-head 9.0.176-1 amd64 CUDA miscellaneous headers
ii cuda-repo-ubun 9.1.85-1 amd64 cuda repository configuration fil
un libcuda1-340 <none> <none> (no description available)
ii nvinfer-runtim 1.0-1 amd64 nvinfer-runtime-trt repository co
ii nvinfer-runtim 1-1 amd64 nvinfer-runtime-trt repository co
I am going to put this on hold temporarily. CUDA does not support Ubuntu Bionics yet. I tried to use the latest version of CUDA anyway and get these errors:
cat /var/log/nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Mon Aug 13 10:23:44 2018
installer version: 396.37
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
nvidia-installer command line:
Using built-in stream user interface
-> Detected 8 CPUs online; setting concurrency level to 8.
-> Installing NVIDIA driver version 396.37.
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory. Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written. For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk. Please reboot your system and attempt NVIDIA driver installation again. Note if you later wish to reenable Nouveau, you will need to delete these files: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility libraries? (Answer: Yes)
-> Will install GLVND GLX client libraries.
-> Will install GLVND EGL client libraries.
-> Skipping GLX non-GLVND file: ""
-> Skipping GLX non-GLVND file: ""
-> Skipping GLX non-GLVND file: ""
-> Skipping EGL non-GLVND file: ""
-> Skipping EGL non-GLVND file: ""
-> Skipping EGL non-GLVND file: ""
-> Skipping GLX non-GLVND file: "./32/"
-> Skipping GLX non-GLVND file: ""
-> Skipping GLX non-GLVND file: ""
-> Skipping EGL non-GLVND file: "./32/"
-> Skipping EGL non-GLVND file: ""
-> Skipping EGL non-GLVND file: ""
Looking for install checker script at ./libglvnd_install_checker/
executing: '/bin/sh ./libglvnd_install_checker/'...
Checking for libglvnd installation.
Checking libGLdispatch...
Checking libGLdispatch dispatch table
Checking call through libGLdispatch
All OK
libGLdispatch is OK
Checking for libGLX
libGLX is OK
Checking for libEGL
eglGetDisplay failed
Checking entrypoint library
Checking call through libGLdispatch
Checking call through library
dlopen("") failed: cannot open shared object file: No such file or directory
Checking entrypoint library
Checking call through libGLdispatch
Checking call through library
glGetString was not called
Found libglvnd libraries:
Missing libglvnd libraries:
-> An incomplete installation of libglvnd was found. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries. (Answer: Abort installation.)
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at

How to run TensorFlow on AMD/ATI GPU?

After reading this tutorial I checked GPU session on this simple code
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name = 'a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = [3,2], name = 'b')
c = tf.matmul(a, b)
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
x =
The output was
2018-08-07 18:44:59.019144: I
tensorflow/core/platform/] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA Device mapping: no known devices. 2018-08-07 18:44:59.019536: I
tensorflow/core/common_runtime/] Device mapping:
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
2018-08-07 18:44:59.019902: I
tensorflow/core/common_runtime/] MatMul:
(MatMul)/job:localhost/replica:0/task:0/device:CPU:0 a: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 2018-08-07
18:44:59.019926: I tensorflow/core/common_runtime/] a:
(Const)/job:localhost/replica:0/task:0/device:CPU:0 b: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 2018-08-07
18:44:59.019934: I tensorflow/core/common_runtime/] b:
(Const)/job:localhost/replica:0/task:0/device:CPU:0 [[ 22. 28.] [
49. 64.]]
As you see there is no calculation done by GPU.
and when I changed the code to use GPU's configuration and process fraction:
conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction = 0.4
with tf.Session(config = conf) as sess:
x =
The output was
2018-08-07 18:52:22.681221: I
tensorflow/core/platform/] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA [[ 22. 28.] [ 49. 64.]]
What can I do to run the session on GPU card? Thank you.
It is most certainly possible to run tensorflow on AMD GPUs. About 2 years back ROCm was released which gets things done. However, the is a caveat, that it runs only on Linux as of now owing to its open-source origins. So if you are willing to use Linux then you can most certainly train your DL models using AMD GPUs. That said the amount of support you will get is low as the community is still not large enough. Google search for ROCm and you can get instructions on how to get it set up and running on a Linux machine. May be it will work with WSL2 in windows, but I have not tried it yet and so cannot comment on that.
here is a link to ROCm installation docs
You can use TensorflowJS, the Javascript version of tensorflow.
TensorflowJS does not have any HW limitation and can run on all the gpu supporting webGL.
The api is pretty similar to tf in python and the project provides scripts to convert your models from python to JS
I believe TensorFlow-GPU only support GPU card with CUDA Compute Capability >= 3.0 of NVIDIA.
The following TensorFlow variants are available for installation:
TensorFlow with CPU support only. If your system does not have a NVIDIA® GPU, you must install this version. This version of TensorFlow is usually easier to install, so even if you have an NVIDIA GPU, we recommend installing this version first.
TensorFlow with GPU support. TensorFlow programs usually run much faster on a GPU instead of a CPU. If you run performance-critical applications and your system has an NVIDIA® GPU that meets the prerequisites, you should install this version. See TensorFlow GPU support for details.