I enable GPU by going to >>runtime>> change runtime type >> then choose GPU.
But when I run my code I get this error:
usage: train.py [-h] [--pre PRETRAINED] TRAIN TEST GPU TASK
train.py: error: the following arguments are required: GPU, TASK
this is the part of the code that make error:
! python train.py part_A_train.json part_A_val.json
I also have this warning:
Warning: You are connected to a GPU runtime, but not utilising the GPU.
but by running this code looks like that GPU is active!
I am new with Deep learning. I have a A100 GPU installed with CUDA 11.6. I installed using Conda tensor flow-1.15 and tensorflow gpu - 1.15, cudatoolkit 10.0, python 3.7 but the code I am trying to run from github has given a note as below and it shows errors which I am finding difficult to interpret where I went wrong.The error is displayed as
failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-06-30 09:37:12.049400: I tensorflow/stream_executor/stream.cc:4925] [stream=0x55d668879990,impl=0x55d668878ac0] did not memcpy device-to-host; source: 0x7f2fe2d0d400 2022-06-30 09:37:12.056385: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at iterator_ops.cc:867 : Cancelled: Operation was cancelled
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] [[Hyperprior/truediv_3/_3633]]
NOTE: At the moment, we only support CUDA 10.0, Python 3.6-3.7, TensorFlow 1.15, and Tensorflow Compression 1.3. TensorFlow must be installed via pip, not conda. Unfortunately, newer versions of Tensorflow or Python will not work due to various constraints in the dependencies and in the TF binary API.
I am using tensorflow-gpu 2.3.0 with CUDA_VERSION=8.0.61 and CUDNN_VERSION=6.0.21.
I just run a tensorflow code and get FailedPreconditionError:
'tensorflow.python.framework.errors_impl.FailedPreconditionError: Failed to allocate scratch buffer for device 0 [Op:VarHandleOp] name: Variable/'
What can I do to fix this?
Thanks
!rm darknet
!make
!./darknet detector train yolo.data cfg/yolov4-custom.cfg yolov4.conv.137 -dont_show
When I train my dataset on yolov4 using darknet in colab, I received this error:
CUDA-version: 11000 (11020), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 3.2.0
yolov4-custom
*** stack smashing detected ***: <unknown> terminated
I don't understand why this error exists, just a week ago, no problem when running code
I know that there are similar question on stackoverflow, but those answers are useless in my case.
I have:
Tensorflow 2.1.0
CUDA 10.1
CUDNN 7.6.5
Nvidia GeForce 920M (DRIVER UPGRADED TO LAST VERSION)
I tried some code snipped like:
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
And others similar code BUT NOTHING.
Full error: Attempting to fetch value instead of handling error Internal: failed to get device att
ribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error