I am trying to compile a tensorflow model using UNET architecture (OS->Rocky Linux 8.6, GPU->Quadro P620, Tensoflow-> 2.11.0, CUDA->11.6). The model works fine on CPU and google colab. But when i try to run it on GPU then the following problem comes during model.fit.
in tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc(5750): 'cudnnBatchNormalizationForwardTrainingEx( cudnn.handle(), mode,
bn_ops, &one, &zero, x_descriptor.handle(), x.opaque(),
x_descriptor.handle(), side_input.opaque(), x_descriptor.handle(),
y->opaque(), scale_offset_descriptor.handle(), scale.opaque(),
offset.opaque(), exponential_average_factor, batch_mean_opaque,
batch_var_opaque, epsilon, saved_mean->opaque(),
saved_inv_var->opaque(), activation_desc.handle(), workspace.opaque(),
workspace.size(), reserve_space.opaque(), reserve_space.size())'
I am training EfficientDet v2 model in coco json format on colab. model confg are here:
gtf.Train_Dataset(root_dir, coco_dir, img_dir, set_dir, batch_size=8, image_size=512, use_gpu=True,num_workers=2)
gtf.Set_Hyperparams(lr=0.0001, val_interval=1, es_min_delta=0.0, es_patience=0)
gtf.Train(num_epochs=10, model_output_dir="trained/");
I am facing following issue while training:
I tried adding this code and restarting runtime but facing same issues.
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
Anyone to solve?
After choosing and downloading a model from TensorFlow 2 Detection Model Zoo, it can be loaded as followed:
import tensorflow as tf
model = tf.saved_model.load(f'./efficientdet_d0_coco17_tpu-32/saved_model/')
However, it looks like one cannot extract the number of trainable variables directly/indirectly from the model variable, according to this investigation.
Nevertheless, the model training can continue, with new data, as this is a typical use-case of a pre-trained model. There must be a way to get the number of trainable variables. But I don't know how.
I tried:
# AttributeError: module 'tensorflow' has no attribute 'trainable_variables'
Tensorflow 2.7.0 (implying CUDA 11.2, cuDNN 8.1).
Windows 10 x64
Python 3.9.7
NVIDIA GeForce MX150, Compute capability: 6.1
I have retrained some tensorflow2.0 model, it's working as 1 class object detector, prepared with object detection api v2 (https://tensorflow-object-detection-api-tutorial.readthedocs.io/).
After that I have converted it to onnx (tf2onnx.convert) and tested - got the same inference results.
I have tested all pretrained models (downloaded from tf model zoo https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md):
I have retrained it by using some small batch of data.
The problem is with using it with gstreamer/deepstream. As I have seen, gstreamer consumes the onnx model, or model after converting it to TensorRT. (If I will provide onnx - model is also converted to TensorRT of course, but it's done by gstreamer right before running)
I was also trying to same pipeline with train->convert to onnx->convert to trt (or just provide onnx model to gstreamer). Same issue.
ERROR: [TRT]: [graph.cpp::computeInputExecutionUses::519] Error Code
9: Internal Error ((Unnamed Layer* 747) [Recurrence]: IRecurrenceLayer
cannot be used to compute a shape tensor)
TensorRT Version:
tf2onnx Version: 1.9.3
Is there any chance to get some help?
Or maybe I should skip the onnx model and just convert it from tensorflow to tensorRT engine? Is it possible?
Of course I can upload the model if it would help.
My tensorflow recognizes my gpu
However, when I call model.fit() on my data it shows:
epoch(1/2) and then the kernel dies immediately
If I run this in a separate virtual environment with no GPU it works fine:
I have simplified the model architecture and number of training points to only ten as a quick test and it still fails
Simple example
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
model = keras.Sequential()
model.add(Dense(1, activation='sigmoid'))
opt = keras.optimizers.Adam(learning_rate=.001)
model.compile(loss = 'binary_crossentropy' , optimizer = opt, metrics = ['accuracy'] )
info = model.fit(X_train, y_train, epochs=2, batch_size=2,shuffle=True, verbose=1)
Python 3.8.8
Num GPUs Available 1
cuda v11.2
I am going to answer my own question rather than deleting this because maybe someone else will be making the same simple mistake I was.
The main mistake I made was having the incorrect CUDA download. you can refer to the what versions are correct at this link:
TLDR: Just follow this video:
This also highlighted the importance of a virtual environment where you control the package versions to prevent incompatibilities.
I had the same problem. I transferred the code into a python file and found the root cause. In my case it was copying cudnn dll files into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin. Check the following link as well:
Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
TensorFlow Object Detection API
Using the TensorFlow Object Detection API to retrain MobileNet on my own DataSet. The issue occurs as I try to run my inference graph that has been both frozen and quantized.
Ubuntu 16.04,
TensorFlow 1.2 (from source, CPU only),
Bazel 0.4.5
Use provided frozen_graph.pb from model zoo.
Quantize to 8-bit using
Run inference
This works, however,
Re-train and produce my own frozen_graph.pb using object_detection/export_inference_graph.py
Quantize to 8-bit using bazel-bin/tensorflow/tools/graph_transforms/transform_graph
Run inference <-- Produces error
Does NOT work, and the error I'm getting during the attempt to run the graph is:
line 1298, in _do_call
raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: The node
'Preprocessor/map/while/ResizeImage/ResizeBilinear/eightbit' has
inputs from different frames. The input
'Preprocessor/map/while/ResizeImage/size' is in frame
'Preprocessor/map/while/Preprocessor/map/while/'. The input
is in frame ''.
Since I can quantize and run the provided frozen_graph.pb the issue has to be with the export tool? Which export tool was used to create the frozen_graph.pb that are in the model zoo? Or how was the export tool called?
Quote from comments in export_inference_graph.pb, assuring me that it should produce a frozen graph if checkpoint is provided.
"Optionally, one can freeze the graph by converting the weights in the provided
checkpoint as graph constants thereby eliminating the need to use a checkpoint
file during inference."