CUDNN_STATUS_EXECUTION_FAILED error in tensorflow model on GPU

CUDNN_STATUS_EXECUTION_FAILED error in tensorflow model on GPU - tensorflow

I am trying to compile a tensorflow model using UNET architecture (OS->Rocky Linux 8.6, GPU->Quadro P620, Tensoflow-> 2.11.0, CUDA->11.6). The model works fine on CPU and google colab. But when i try to run it on GPU then the following problem comes during model.fit.
CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc(5750): 'cudnnBatchNormalizationForwardTrainingEx( cudnn.handle(), mode,
bn_ops, &one, &zero, x_descriptor.handle(), x.opaque(),
x_descriptor.handle(), side_input.opaque(), x_descriptor.handle(),
y->opaque(), scale_offset_descriptor.handle(), scale.opaque(),
offset.opaque(), exponential_average_factor, batch_mean_opaque,
batch_var_opaque, epsilon, saved_mean->opaque(),
saved_inv_var->opaque(), activation_desc.handle(), workspace.opaque(),
workspace.size(), reserve_space.opaque(), reserve_space.size())'

Related

CUDA error: device-side assert triggered in Colab

I am training EfficientDet v2 model in coco json format on colab. model confg are here:
gtf.Train_Dataset(root_dir, coco_dir, img_dir, set_dir, batch_size=8, image_size=512, use_gpu=True,num_workers=2)
gtf.Model();
gtf.Set_Hyperparams(lr=0.0001, val_interval=1, es_min_delta=0.0, es_patience=0)
%%time
gtf.Train(num_epochs=10, model_output_dir="trained/");
I am facing following issue while training:
I tried adding this code and restarting runtime but facing same issues.
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
Anyone to solve?

Tensorflow 2: Get the number of trainable parameters in a Model from Model Garden (Zoo)

After choosing and downloading a model from TensorFlow 2 Detection Model Zoo, it can be loaded as followed:
import tensorflow as tf
model = tf.saved_model.load(f'./efficientdet_d0_coco17_tpu-32/saved_model/')
However, it looks like one cannot extract the number of trainable variables directly/indirectly from the model variable, according to this investigation.
Nevertheless, the model training can continue, with new data, as this is a typical use-case of a pre-trained model. There must be a way to get the number of trainable variables. But I don't know how.
I tried:
tf.trainable_variables
# AttributeError: module 'tensorflow' has no attribute 'trainable_variables'
Environment:
Tensorflow 2.7.0 (implying CUDA 11.2, cuDNN 8.1).
Windows 10 x64
Python 3.9.7
NVIDIA GeForce MX150, Compute capability: 6.1

Converting tensorflow2.0 model to TensorRT engine (tensorflow2.0)

I have retrained some tensorflow2.0 model, it's working as 1 class object detector, prepared with object detection api v2 (https://tensorflow-object-detection-api-tutorial.readthedocs.io/).
After that I have converted it to onnx (tf2onnx.convert) and tested - got the same inference results.
I have tested all pretrained models (downloaded from tf model zoo https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md):
ssd_mobilenet_v2_320x320_coco17_tpu-8
ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8
ssd_resnet50_v1_fpn_640x640_coco17_tpu-8
I have retrained it by using some small batch of data.
The problem is with using it with gstreamer/deepstream. As I have seen, gstreamer consumes the onnx model, or model after converting it to TensorRT. (If I will provide onnx - model is also converted to TensorRT of course, but it's done by gstreamer right before running)
I was also trying to same pipeline with train->convert to onnx->convert to trt (or just provide onnx model to gstreamer). Same issue.
Error:
ERROR: [TRT]: [graph.cpp::computeInputExecutionUses::519] Error Code
9: Internal Error ((Unnamed Layer* 747) [Recurrence]: IRecurrenceLayer
cannot be used to compute a shape tensor)
TensorRT Version: 8.2.1.8
tf2onnx Version: 1.9.3
Is there any chance to get some help?
Or maybe I should skip the onnx model and just convert it from tensorflow to tensorRT engine? Is it possible?
Of course I can upload the model if it would help.
BR!

Python 3.8.8 Jupyter notebook kernel dies when I call model.fit() when I try to use my GPU

My tensorflow recognizes my gpu
However, when I call model.fit() on my data it shows:
epoch(1/2) and then the kernel dies immediately
If I run this in a separate virtual environment with no GPU it works fine:
I have simplified the model architecture and number of training points to only ten as a quick test and it still fails
Simple example
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
model = keras.Sequential()
model.add(Dense(4,
activation='relu'))
model.add(Dense(1, activation='sigmoid'))
opt = keras.optimizers.Adam(learning_rate=.001)
model.compile(loss = 'binary_crossentropy' , optimizer = opt, metrics = ['accuracy'] )
info = model.fit(X_train, y_train, epochs=2, batch_size=2,shuffle=True, verbose=1)
versions:
Python 3.8.8
Num GPUs Available 1
2.5.0-dev20210227
2.4.3
cuda v11.2

I am going to answer my own question rather than deleting this because maybe someone else will be making the same simple mistake I was.
The main mistake I made was having the incorrect CUDA download. you can refer to the what versions are correct at this link:
https://www.tensorflow.org/install/source#gpu
TLDR: Just follow this video:
https://www.youtube.com/watch?v=hHWkvEcDBO0
This also highlighted the importance of a virtual environment where you control the package versions to prevent incompatibilities.

I had the same problem. I transferred the code into a python file and found the root cause. In my case it was copying cudnn dll files into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin. Check the following link as well:
Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found

TF Object Detection API - Trouble running quantized network after freezing and quantizing my fine-tuned network

TensorFlow Object Detection API
Using the TensorFlow Object Detection API to retrain MobileNet on my own DataSet. The issue occurs as I try to run my inference graph that has been both frozen and quantized.
System:
Ubuntu 16.04,
TensorFlow 1.2 (from source, CPU only),
Bazel 0.4.5
Issue:
Use provided frozen_graph.pb from model zoo.
Quantize to 8-bit using
bazel-bin/tensorflow/tools/graph_transforms/transform_graph.
Run inference
This works, however,
Re-train and produce my own frozen_graph.pb using object_detection/export_inference_graph.py
Quantize to 8-bit using bazel-bin/tensorflow/tools/graph_transforms/transform_graph
Run inference <-- Produces error
Does NOT work, and the error I'm getting during the attempt to run the graph is:
File
"/home/unibap/TensorFlow/tensorflow-python2-sse4.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py",
line 1298, in _do_call
raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: The node
'Preprocessor/map/while/ResizeImage/ResizeBilinear/eightbit' has
inputs from different frames. The input
'Preprocessor/map/while/ResizeImage/size' is in frame
'Preprocessor/map/while/Preprocessor/map/while/'. The input
'Preprocessor/map/while/ResizeImage/ResizeBilinear_eightbit/Preprocessor/map/while/ResizeImage/ExpandDims/quantize'
is in frame ''.
Since I can quantize and run the provided frozen_graph.pb the issue has to be with the export tool? Which export tool was used to create the frozen_graph.pb that are in the model zoo? Or how was the export tool called?
PS:
Quote from comments in export_inference_graph.pb, assuring me that it should produce a frozen graph if checkpoint is provided.
"Optionally, one can freeze the graph by converting the weights in the provided
checkpoint as graph constants thereby eliminating the need to use a checkpoint
file during inference."
Best

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas