I get an error related to darknet when running yolo, how to fix it - camera

Some info;
ubutu 18.04
cuda-10.2
NVIDIA-SMI 440.33.01
Grapic card GeForce GTX 1080
An error occurs when running usbcam and YOLO at the same time.
darknet_ros:
/home/foscar/ISCC_2022/src/vision_team/darknet_ros/darknet/src/cuda.c:36:
check_error: Assertion `0' failed. [darknet_ros-2] process has died
[pid 5975, exit code -6, cmd
/home/foscar/ISCC_2022/devel/lib/darknet_ros/darknet_ros
camera/rgb/image_raw:=/usb_cam/image_raw __name:=darknet_ros
__log:=/home/foscar/.ros/log/456e13fe-0109-11ed-98f3-705dccfd163a/darknet_ros-2.log].
log file:
/home/foscar/.ros/log/456e13fe-0109-11ed-98f3-705dccfd163a/darknet_ros-2*.log

Related

Jupyter kernel crashed only while training ConvNext model

I'm trying to run the tutorial code from Kaggle on my computer. However, the kernel crashed in the model training part history=ConvNeXt_model.fit().
Here is the jupyter notebook log:
warn 16:45:23.988: StdErr from Kernel Process 2023-02-13 16:45:23.989108: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neu
warn 16:45:23.988: StdErr from Kernel Process ral Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
warn 16:45:24.253: StdErr from Kernel Process 2023-02-13 16:45:24.253410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/repli
warn 16:45:24.253: StdErr from Kernel Process ca:0/task:0/device:GPU:0 with 21348 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
warn 16:45:44.398: StdErr from Kernel Process 2023-02-13 16:45:44.398973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
warn 16:45:44.798: StdErr from Kernel Process 2023-02-13 16:45:44.799017: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logge
warn 16:45:44.799: StdErr from Kernel Process d once.
warn 16:45:45.140: StdErr from Kernel Process 2023-02-13 16:45:45.141061: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1e2b8a88750 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-02-13 16:45:45.141144: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0
warn 16:45:45.141: StdErr from Kernel Process ): NVIDIA GeForce RTX 4090, Compute Capability 8.9
warn 16:45:45.191: StdErr from Kernel Process 2023-02-13 16:45:45.191262: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code -1, output: ' If the error message indicates that a file could not be written, please verify that sufficient
warn 16:45:45.191: StdErr from Kernel Process filesystem space is provided.
error 16:45:45.530: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\Users\User\anaconda3\envs\tf\lib\site-packages\traitlets\traitlets.py:2548: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.
warn(
c:\Users\User\anaconda3\envs\tf\lib\site-packages\traitlets\traitlets.py:2499: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use '00cfbd3c-ac34-43be-a838-9653221d1a82' instead of 'b"00cfbd3c-ac34-43be-a838-9653221d1a82"'.
warn(
2023-02-13 16:45:23.989108: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-13 16:45:24.253410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21348 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
2023-02-13 16:45:44.398973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
2023-02-13 16:45:44.799017: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2023-02-13 16:45:45.141061: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1e2b8a88750 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-02-13 16:45:45.141144: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2023-02-13 16:45:45.191262: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code -1, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
info 16:45:45.530: Dispose Kernel process 24268.
error 16:45:45.530: Raw kernel process exited code: 3221226505
error 16:45:45.531: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:33213)
at c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:52265
at Map.forEach (<anonymous>)
at y._clearKernelState (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:52250)
at y.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:45732)
at c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:17:139244
at Z (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:1608939)
at Kp.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:17:139221)
at qp.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:17:146518)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
warn 16:45:45.531: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
It is weird that I can successfully train other models (such as ResNet or EfficientNet) using the GPU but only failed in the ConvNext model. And I followed the instruction to install the TensorFlow.
I guess the error may happen in XLA implementation, but I do not know how the fix it.
All the codes are running on win10 VScode.
Device information:
Nvidia Driver 527.56
CUDA 11.2
cuDNN 8.1.0
Python 3.9.10
TensorFlow 2.10.1
GPU Nvidia RTX 4090

Google colab: problem in the train of YOLOv4-tiny-Darknet-Roboflow

i'm using google colab for the detection of object with Yolo.
in the step of the Train Custom YOLOv4 Detector, i have this error
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Nov 26 2020 - 16:49:52
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
darknet: ./src/utils.c:325: error: Assertion `0' failed.
can you help me please

tensorflow-gpu running failure on LINUX

I've installed CUDA and cuDnn on ubuntu 16.04.
CUDA version : 9.0 // with driver version 390.87
cuDNN version : 7.2 for CUDA9.0
import tensorflow as tf
works fine, but
tf.Session()
renders the following error.
2018-09-15 16:43:23.281375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-15 16:43:23.281431: E tensorflow/core/common_runtime/direct_session.cc:158] Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/imhgchoi/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1494, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/imhgchoi/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 626, in __init__
self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
The error message implies that I've installed the wrong version of CUDA driver, but I'm lost. I'm not sure what steps to take in order to remedy this situation.
AFTER ADDING ENVIRONMENT VARIABLES
That only added new errors..
2018-09-15 17:13:39.684390: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-15 17:13:39.767963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-15 17:13:39.768481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.506
pciBusID: 0000:09:00.0
totalMemory: 3.94GiB freeMemory: 3.41GiB
2018-09-15 17:13:39.768502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-15 17:13:39.768635: E tensorflow/core/common_runtime/direct_session.cc:158] Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
Maybe it is your envirnment variables causing this problem.
try this:
Add these lines at the end of your ~/.bashrc file and open a terminal and simply start a python session there and then import tensorflow (you should have the tensporflow-gpu installed via apt) and see if it works:
sudo vim ~/.bashrc
and add these at the end of the file and restart your terminal:
export CUDA_HOME="/usr/local/cuda-9.0"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64"
export PATH="${CUDA_HOME}/bin:${PATH}"
export DYLD_LIBRARY_PATH="${CUDA_HOME}/lib"
Edit.1
Please make sure that "usr/local/cuda-9.0" is the directory that you installed cuda.

CUDA_ERROR_OUT_OF_MEMORY ubuntu 14.04 cuda8

I am using tensorflow with cuda8 on ubuntu 14.04
My CPU: GeForce GT 740M
I am a newbie to GPUs
Sometimes, after I have run the same script several times on the gpu, I will get a memory error, which will be gone the next time I reboot.
Thanks for sharing your expertise with me. I dont really know how to solve this problem.
Here is the error message:
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910]
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties:
name: GeForce GT 740M
major: 3 minor: 5 memoryClockRate (GHz) 1.0325
pciBusID 0000:01:00.0
Total memory: 1.96GiB
Free memory: 118.75MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 118.75M (124518400 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
There are many reasons you could be getting this issue.
Check if you're using the GPU to also run X server because it crashed from the start. Check with nvidia-smi to see how much space you actually have to work with.
Make sure you have the appropriate CUDA drivers and toolkit version for the tensorflow you are running (367.35 or newer and toolkit 8.0)
Is your card supported? (I think it should work but nvidia likes to be sneaky about supporting old hardware where they lock you out as a way to buy newer nvidia GPUs). After double checking your card is supported. Needs CUDA compute >= 3.0
You can debug your code with the tensorflow debugger.
Last but not least as comments have suggested it seems like your GPU resources aren't being freed after your software has ended. Make sure you kill the process as the GPU will free the resources after the program calls exit().

Running Tensorflow on GeForce 940M (Ubuntu)

I'm running the CIFAR-10 classification from the Tensorflow for the very first time on my laptop with GeForce 940M. I'm running the training with the pre-defined parameters as follows:
python cifar10_train.py
after step 1800 I'm getting the following errors:
E tensorflow/stream_executor/cuda/cuda_event.cc:33] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
E tensorflow/stream_executor/cuda/cuda_driver.cc:1182] failed to enqueue async memcpy from device to host: CUDA_ERROR_ILLEGAL_ADDRESS; host dst: 0x7ff8e9bf26c0; GPU src: 0x5011c0600; size: 16=0x10
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:105] Unexpected Event status: 1
I tensorflow/stream_executor/stream.cc:3304] stream 0x35e7190 did not block host until done; was already in an error state
Aborted (core dumped)
Does anybody have any idea?
Thanks a lot in advance for your help! Any advice is kindly appreciated!