ONNX converted TensorFlow saved model runs on CPU but not on GPU - tensorflow

I converted a tensorflow saved model to ONNX format using tf2onnx :
python3 -m tf2onnx.convert --saved-model saved_model/ --output onnx/model.onnx --opset 11
The conversion worked fine and I can run inference with the ONNX model using CPU.
I installed onnxruntime-gpu to run inference with GPU and encountered an error :
RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Relu node. Name:'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/conv1/Relu' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:97 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:91 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 2: out of memory ; GPU=0 ; hostname=coincoin; expr=cudaMalloc((void**)&p, size);
Stacktrace:
Stacktrace:
I am the only one using the GPU which is a Titan RTX (24GB of RAM). The model runs fine on GPU using its tensorflow saved model version, with 10GB of the GPU's RAM.
Versions are :
tensorflow 1.14.0
CUDA 10.0
CuDNN 7.6.5
onnx 1.6.0
onnxruntime 1.1.0
tf2onnx 1.9.2
python 3.6
Ubuntu 18.04

according to your information and Versions, maybe two solutions to solve the problem:
the information is out of memory, please check the gpu memory is available yet, if not available, set the gpu memory:
config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0"
config.gpu_options.per_process_gpu_memory_fraction = 0.1
set_session(tf.Session(config=config))
downgrade onnxruntime-gpu, according to your Versions, CUDA 10.0 matchs with onnxruntime-gpu==1.0.0, not 1.1.0, which needs cuda 11
https://github.com/Microsoft/onnxruntime/releases/tag/v1.1.0

Related

tensorflow compatibility with a100 gpu

I am new with Deep learning. I have a A100 GPU installed with CUDA 11.6. I installed using Conda tensor flow-1.15 and tensorflow gpu - 1.15, cudatoolkit 10.0, python 3.7 but the code I am trying to run from github has given a note as below and it shows errors which I am finding difficult to interpret where I went wrong.The error is displayed as
failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-06-30 09:37:12.049400: I tensorflow/stream_executor/stream.cc:4925] [stream=0x55d668879990,impl=0x55d668878ac0] did not memcpy device-to-host; source: 0x7f2fe2d0d400 2022-06-30 09:37:12.056385: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at iterator_ops.cc:867 : Cancelled: Operation was cancelled
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] [[Hyperprior/truediv_3/_3633]]
NOTE: At the moment, we only support CUDA 10.0, Python 3.6-3.7, TensorFlow 1.15, and Tensorflow Compression 1.3. TensorFlow must be installed via pip, not conda. Unfortunately, newer versions of Tensorflow or Python will not work due to various constraints in the dependencies and in the TF binary API.

Stop Tensorflow trying to load cuda temporarily

I have this code to disable GPU usage:
import numpy as np
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
w = tf.Variable(
[
[1.],
[2.]
])
I get this output still, not sure why :
E:\MyTFProject\venv\Scripts\python.exe E:/MyTFProject/tfvariable.py
2021-11-03 14:09:16.971644: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-11-03 14:09:16.971644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-03 14:09:19.563793: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-03 14:09:19.566793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: newtonpc
2021-11-03 14:09:19.567793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: mypc
2021-11-03 14:09:19.567793: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
TF Version: '2.6.1'
Not able to stop it from loading Cuda DLLs. I dont want to setup cuda just right now. Maybe later.
I am using the latest PyCharm and installed tensorflow as given in the site with pip.
You can try to reinstall tensorflow with CPU-only version. The links are available here depending on your OS and your python version:
https://www.tensorflow.org/install/pip?hl=fr#windows_1

How to configure tensorflow with CPU support?

I am trying to run tensorflow with CPU support.
tensorflow:
Version: 1.14.0
Keras:
Version: 2.3.1
When I try to run the following piece of code :
def run_test_harness(trainX,trainY,testX,testY):
datagen=ImageDataGenerator(rescale=1.0/255.0)
train_it = datagen.flow(trainX, trainY, batch_size=1)
test_it = datagen.flow(testX, testY, batch_size=1)
model=define_model()
history = model.fit_generator(train_it, steps_per_epoch=len(train_it),
validation_data=test_it, validation_steps=len(test_it), epochs=1, verbose=0)
I get the following error as shown in image:
Image shows the error
I tried to configure bazel for the same but it was of no use. It would be helpful if someone could direct me to resources or help with the problem. Thank you
EDIT : (Warning messages)
WARNING:tensorflow:From /home/neha/valiance/kerascpu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
WARNING:tensorflow:From /home/neha/valiance/kerascpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-10-22 12:41:36.023849: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-22 12:41:36.326420: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299965000 Hz
2020-10-22 12:41:36.327496: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5502350 executing computations on platform Host. Devices:
2020-10-22 12:41:36.327602: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-10-22 12:41:36.679930: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2020-10-22 12:41:36.890241: W tensorflow/core/framework/allocator.cc:107] Allocation of 3406823424 exceeds 10% of system memory.
^Z
[1]+ Stopped python3 model.py
You should try running your code on google colab. I think there aren't enough resources available on your PC for the task you are trying to run even though you are using a batch_size of 1.

Getting CUDNN_STATUS_EXECUTION_FAILED error on Jetson Nano with OpenCV dnn module

I was following one of the online tutorials but I was getting this error:
Traceback (most recent call last):
File “ssd_object_detection.py”, line 20, in
detections = net.forward()
cv2.error: OpenCV(4.3.0) /home/blah/opencv/modules/dnn/src/layers/…/cuda4dnn/primitives/…/csl/cudnn/convolution.hpp:461: error: (-217:Gpu API call) CUDNN_STATUS_EXECUTION_FAILED in function ‘convolve_with_bias_activation’
It's a python script and I use Opencv dnn module with a pre-trained model
This is my configuration:
Jetson Nano device
Ubuntu 18.04
/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
– NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
– NVIDIA GPU arch: 53
– NVIDIA PTX archs:
– cuDNN: YES (ver 8.0)
– NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
– NVIDIA GPU arch: 53
– cuDNN: YES (ver 8.0)
opencv 4.3.0 built from source with OPENCV_DNN_CUDA=ON, CUDNN_VERSION=‘8.0’, WITH_CUDA=ON, WITH_CUDNN=ON, and many other settings enabled
Python 3.7.7
This is a snippet of the code I am trying to run (it completes successfully if I don’t use the GPU). It fails at the line detections = net.forward()
CLASSES = [“background”, “aeroplane”]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
print("[INFO] setting preferable backend and target to CUDA…")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
print("[INFO] accessing video stream…")
vs = cv2.VideoCapture(args[“input”] if args[“input”] else 0)
writer = None
fps = FPS().start()
while True:
(grabbed, frame) = vs.read()
frame = imutils.resize(frame, width=400)
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()
for i in np.arange(0, detections.shape[2]):
....

How to run TensorFlow on AMD/ATI GPU?

After reading this tutorial https://www.tensorflow.org/guide/using_gpu I checked GPU session on this simple code
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2,3], name = 'a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape = [3,2], name = 'b')
c = tf.matmul(a, b)
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
x = sess.run(c)
print(x)
The output was
2018-08-07 18:44:59.019144: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA Device mapping: no known devices. 2018-08-07 18:44:59.019536: I
tensorflow/core/common_runtime/direct_session.cc:288] Device mapping:
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
2018-08-07 18:44:59.019902: I
tensorflow/core/common_runtime/placer.cc:886] MatMul:
(MatMul)/job:localhost/replica:0/task:0/device:CPU:0 a: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 2018-08-07
18:44:59.019926: I tensorflow/core/common_runtime/placer.cc:886] a:
(Const)/job:localhost/replica:0/task:0/device:CPU:0 b: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 2018-08-07
18:44:59.019934: I tensorflow/core/common_runtime/placer.cc:886] b:
(Const)/job:localhost/replica:0/task:0/device:CPU:0 [[ 22. 28.] [
49. 64.]]
As you see there is no calculation done by GPU.
and when I changed the code to use GPU's configuration and process fraction:
conf = tf.ConfigProto()
conf.gpu_options.per_process_gpu_memory_fraction = 0.4
with tf.Session(config = conf) as sess:
x = sess.run(c)
print(x)
The output was
2018-08-07 18:52:22.681221: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA [[ 22. 28.] [ 49. 64.]]
What can I do to run the session on GPU card? Thank you.
It is most certainly possible to run tensorflow on AMD GPUs. About 2 years back ROCm was released which gets things done. However, the is a caveat, that it runs only on Linux as of now owing to its open-source origins. So if you are willing to use Linux then you can most certainly train your DL models using AMD GPUs. That said the amount of support you will get is low as the community is still not large enough. Google search for ROCm and you can get instructions on how to get it set up and running on a Linux machine. May be it will work with WSL2 in windows, but I have not tried it yet and so cannot comment on that.
here is a link to ROCm installation docs
You can use TensorflowJS, the Javascript version of tensorflow.
TensorflowJS does not have any HW limitation and can run on all the gpu supporting webGL.
The api is pretty similar to tf in python and the project provides scripts to convert your models from python to JS
I believe TensorFlow-GPU only support GPU card with CUDA Compute Capability >= 3.0 of NVIDIA.
The following TensorFlow variants are available for installation:
TensorFlow with CPU support only. If your system does not have a NVIDIA® GPU, you must install this version. This version of TensorFlow is usually easier to install, so even if you have an NVIDIA GPU, we recommend installing this version first.
TensorFlow with GPU support. TensorFlow programs usually run much faster on a GPU instead of a CPU. If you run performance-critical applications and your system has an NVIDIA® GPU that meets the prerequisites, you should install this version. See TensorFlow GPU support for details.
https://www.tensorflow.org/install/install_linux