Google colab: problem in the train of YOLOv4-tiny-Darknet-Roboflow - google-colaboratory

i'm using google colab for the detection of object with Yolo.
in the step of the Train Custom YOLOv4 Detector, i have this error
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Nov 26 2020 - 16:49:52
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:325: error: Assertion `0' failed.
can you help me please

Related

In google colab I enable GPU but it isn't used

I enable GPU by going to >>runtime>> change runtime type >> then choose GPU.
But when I run my code I get this error:
usage: train.py [-h] [--pre PRETRAINED] TRAIN TEST GPU TASK
train.py: error: the following arguments are required: GPU, TASK
this is the part of the code that make error:
! python train.py part_A_train.json part_A_val.json
I also have this warning:
Warning: You are connected to a GPU runtime, but not utilising the GPU.
but by running this code looks like that GPU is active!

tensorflow compatibility with a100 gpu

I am new with Deep learning. I have a A100 GPU installed with CUDA 11.6. I installed using Conda tensor flow-1.15 and tensorflow gpu - 1.15, cudatoolkit 10.0, python 3.7 but the code I am trying to run from github has given a note as below and it shows errors which I am finding difficult to interpret where I went wrong.The error is displayed as
failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-06-30 09:37:12.049400: I tensorflow/stream_executor/stream.cc:4925] [stream=0x55d668879990,impl=0x55d668878ac0] did not memcpy device-to-host; source: 0x7f2fe2d0d400 2022-06-30 09:37:12.056385: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at iterator_ops.cc:867 : Cancelled: Operation was cancelled
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] [[Hyperprior/truediv_3/_3633]]
NOTE: At the moment, we only support CUDA 10.0, Python 3.6-3.7, TensorFlow 1.15, and Tensorflow Compression 1.3. TensorFlow must be installed via pip, not conda. Unfortunately, newer versions of Tensorflow or Python will not work due to various constraints in the dependencies and in the TF binary API.

tensorflow.python.framework.errors_impl.FailedPreconditionError

I am using tensorflow-gpu 2.3.0 with CUDA_VERSION=8.0.61 and CUDNN_VERSION=6.0.21.
I just run a tensorflow code and get FailedPreconditionError:
'tensorflow.python.framework.errors_impl.FailedPreconditionError: Failed to allocate scratch buffer for device 0 [Op:VarHandleOp] name: Variable/'
What can I do to fix this?
Thanks

How to solve " *** stack smashing detected ***: <unknown> terminated " in colab?

!rm darknet
!make
!./darknet detector train yolo.data cfg/yolov4-custom.cfg yolov4.conv.137 -dont_show
When I train my dataset on yolov4 using darknet in colab, I received this error:
CUDA-version: 11000 (11020), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 3.2.0
yolov4-custom
*** stack smashing detected ***: <unknown> terminated
I don't understand why this error exists, just a week ago, no problem when running code

Tensorflow Failed to get device attribute 13 for device 0

I know that there are similar question on stackoverflow, but those answers are useless in my case.
I have:
Tensorflow 2.1.0
CUDA 10.1
CUDNN 7.6.5
Nvidia GeForce 920M (DRIVER UPGRADED TO LAST VERSION)
I tried some code snipped like:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
And others similar code BUT NOTHING.
Full error: Attempting to fetch value instead of handling error Internal: failed to get device att
ribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error