CUDA Error when training YOLOv4-tiny on Colab: no kernel image is available for execution on the device - google-colaboratory

I was following this tutorial to train a YOLOv4-tiny model to detect custom objects: https://www.youtube.com/watch?v=NTnZgLsk_DA
However, when I attempt to train the model, I get this error message:
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jan 7 2022 - 12:01:41
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
I was running the code on Colab, not locally. The GPU used for training is Tesla K80.
A common answer is to set the ARCH values to compute_37, code_37, but I've already set them this way and keep getting the same error! So what should I do to get this code running?
Link to my Colab notebook: https://colab.research.google.com/drive/16EQ6I67OOs1I7rF6PHgBHp1eVHMXXvyO#scrollTo=QyMBDkaL-Aep
Any help would be appreciated!

Related

model optimzer in intel open vino

I used
import tensorflow as tf
model = tf.keras.models.load_model('model.h5')
tf.saved_model.save(model,'model')
for saving my image classification model (tensorflow version on google colab = 2.9.2, intel open vino version[Development Tools] = 2021.4.2 LTS)
---------------------------------------------------------------------------------------
C:\Program Files (x86)\Intel\openvino_2021.4.752\deployment_tools\model_optimizer>python mo_tf.py --saved_model_dir C:\Users\dchoi\CNNProejct_Only_saved_English\saved_model --input_shape [1,32,320,240,3] --output_dir C:\Users\dchoi\CNNproject_only_output_English\output_model
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: None
- Path for generated IR: C:\Users\dchoi\CNNproject_only_output_English\output_model
- IR output name: saved_model
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: Not specified, inherited from the model
- Output layers: Not specified, inherited from the model
- Input shapes: [1,32,320,240,3]
- Mean values: Not specified
- Scale values: Not specified
- Scale factor: Not specified
- Precision of IR: FP32
- Enable fusing: True
- Enable grouped convolutions fusing: True
- Move mean values to preprocess section: None
- Reverse input channels: False
TensorFlow specific parameters:
- Input model in text protobuf format: False
- Path to model dump for TensorBoard: None
- List of shared libraries with TensorFlow custom layers implementation: None
- Update the configuration file with input/output node names: None
- Use configuration file used to generate the model with Object Detection API: None
- Use the config file: None
- Inference Engine found in: C:\Users\dchoi\AppData\Local\Programs\Python\Python38\lib\site-packages\openvino
Inference Engine version: 2021.4.0-3839-cd81789d294-releases/2021/4
Model Optimizer version: 2021.4.2-3974-e2a469a3450-releases/2021/4
[ WARNING ] Model Optimizer and Inference Engine versions do no match.
[ WARNING ] Consider building the Inference Engine Python API from sources or reinstall OpenVINO (TM) toolkit using "pip install openvino==2021.4"
2022-11-19 01:34:44.207311: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-11-19 01:34:44.207542: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
C:\Users\dchoi\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\autograph\impl\api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
2022-11-19 01:34:46.961002: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-11-19 01:34:46.961949: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-11-19 01:34:46.962904: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-19 01:34:46.969471: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-SCBPOUA
2022-11-19 01:34:46.969727: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-SCBPOUA
2022-11-19 01:34:46.970663: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-19 01:34:46.971135: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
[ FRAMEWORK ERROR ] Cannot load input model: SavedModel format load failure: NodeDef mentions attr 'validate_shape' not in Op<name=AssignVariableOp; signature=resource:resource, value:dtype -> ; attr=dtype:type; is_stateful=true>; NodeDef: {{node AssignNewValue}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
------------------------------------------------------------------------------------------
I am getting this kind of error even after I downloaded
install_prerequirement/install_prerequisites_tf2.bat
need help
Your error seems to indicate the mismatch between the TensorFlow version used to load GraphDef file. From my replication, I am able to generate the Intermediate Representation (IR) files using TensorFlow 2.5.3 version. Here is the full Model Optimizer command used:
mo_tf.py --saved_model_dir <path_to_model\IMGC.h5_to_saved_model.pb> --input_shape [1,320,240,3] --output_dir <path_for_output_files>

Time consuming Tensorflow's CUDA driver check in AWS Lambda

I've been running an AWS Lambda and mounted an EFS, where I've installed Tensorflow 2.4. When I try to run the Lambda (and every Lambda that uses Tensorflow 2.4) it wastes a lot of time (about 4 minutes, or maybe more sometimes) on some Tensorflow's settings check. So I need to set a very wide timeout to overcome this issue.
These are the prints that the Lambda produces:
2022-05-17 06:33:21.917336: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:21.921992: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
2022-05-17 06:33:21.922025: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-17 06:33:21.922048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (169.254.137.137): /proc/driver/nvidia/version does not exist
2022-05-17 06:33:21.922460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:22.339905: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 06:33:22.340468: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500010000 Hz
[WARNING] 2022-05-17T06:33:22.436Z c4500036-5b77-4808-a062-f8ae820b0317 AutoGraph could not transform <function Model.make_predict_function..predict_function at 0x7f65bfb37280> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: unsupported operand type(s) for -: 'NoneType' and 'int'
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
What I need is to overcome this waste of time, and run a clean elaboration.

Tensorflow2 Unknown device in Tensorboard

I cannot display the device placement in tensorboard graph. The only device is "unknown device" (see screenshot below).
In the python script, I activated the following option:
tf.debugging.set_log_device_placement(True)
And indeed, the log device placement information is logged correctly in the python output:
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarIsInitializedOp in device /job:localhost/replica:0/task:0/device:GPU:0
...
The tracing is activated as follows:
tf.summary.trace_on(graph=True, profiler=True)
And the graph is displayed correctly in tensorboard but the device name is simply wrong:
OS: Windows 10 x64
Python: 3.8
Tensorflow: 2.2.0
GPU

Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0

I got this when using keras with Tensorflow backend:
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Relevant code:
tfconfig = tf.ConfigProto()
tfconfig.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
tfconfig.gpu_options.allow_growth = True
K.tensorflow_backend.set_session(tf.Session(config=tfconfig))
tensorflow version: 1.14.0
Chairman Guo's code:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
solved my problem of jupyter notebook kernel crashing at:
tf.keras.models.load_model(path/to/my/model)
The fatal message was:
2020-01-26 11:31:58.727326: F
tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch
value instead of handling error Internal: failed initializing
StreamExecutor for CUDA device ordinal 0: Internal: failed call to
cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
My TF's version is: 2.2.0-dev20200123. There are 2 GPUs on this system.
This could be due to your TF-default (i.e. 1st) GPU is running out of memory. If you have multiple GPUs, divert your Python program to run on other GPUs. In TF (suppose using TF-2.0-rc1), set the following:
# Specify which GPU(s) to use
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Or 2, 3, etc. other than 0
# On CPU/GPU placement
config = tf.compat.v1.ConfigProto(allow_soft_placement=True, log_device_placement=True)
config.gpu_options.allow_growth = True
tf.compat.v1.Session(config=config)
# Note that ConfigProto disappeared in TF-2.0
Suppose, however, your environment have only one GPU, then perhaps you have no choice but ask your buddy to stop his program, then treat him a cup of coffee.

Operation was explicitly assigned to /GPU:1 but available devices are

My system has 2 Nvidia GPUs, a GTX 750 Ti which is assigned to the OS for OS graphics and a GTX 1080Ti which is free to be used for Tensorflow. I use the call: tensorflow::graph::SetDefaultDevice("/GPU:1", &graph); to enable GPU1 .
I am running TF 1.10 hand compiled and configured for C++/CMake and Cuda compilation tools, release 9.1, V9.1.85. When I assign GPU 1 to execute my graph I get the following error:
"Invalid argument: Cannot assign a device for operation 'h1_w/read':
Operation was explicitly assigned to /GPU:1 but available devices are
[ /job:localhost/replica:0/task:0/device:CPU:0,
/job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device
specification refers to a valid device. [[{{node h1_w/read}} =
IdentityT=DT_FLOAT, _class=["loc:#h1_w"], _device="/GPU:1"]]"