Im training yolov4 tiny custom dataset, and suddenly loss and other markers drop to -nan
As you can see on the chart, all progress is lost after some iterations (around 800 iterations).
Yolov4 accuracy chart
Training log for given chart:
Darknet training log
Any ideas on given problem? It is running on ubuntu with 4 x GeForce GTX 1080 6GB.
When testing the same network on PC with single GeForce GTX 1060 6GB, it does not crash.
I am running a custom training job using Google Vertex AI. I am using Nvidia Tesla V100 with 2 accelerators. I am training a ML model but my GPU utilization is only 50% during training.
I am using Nvidia Transfer Learning Toolkit to train an object detection model, and I specified GPUs=2 on the TLT commands.
Any ideas how I can get higher GPU utilization?
I'm running a Mask R-CNN model on an edge device (with an NVIDIA GTX 1080). I am currently using the Detectron2 Mask R-CNN implementation and I archieve an inference speed of around 5 FPS.
To speed this up I looked at other inference engines and model implementations. For example ONNX, but I'm not able to gain a faster inference speed.
TensorRT looks very promising to me but I did not found a ready "out-of-the-box" implementation for it.
Are there any other mature and fast inference engines or other techniques to speed up the inference?
It's almost impossible to get higher inference speed for Mask R-CNN on GTX 1080. You may check detectron2 by Facebook AI Research.
Otherwise, I'd suggest to use YOLACT - (You Only Look At CoefficienTs), it can achieve real-time instance segmentation.
On the other hand, if you don't need instance segmentation, you can use YOLO, SSD, etc for object detection.
OpenCV 4.5.0 with DNN_BACKEND_CUDA and DNN_TARGET_CUDA/DNN_TARGET_CUDA_FP16.
Mask RCNN with 1024 x 1024 input image
Device | FPS
------------------ | -------
GTX 1080 Ti (FP32) | 29
RTX 2080 Ti (FP16) | 60
FPS measured includes NMS but excludes other preprocessing and postprocessing. The network fully runs end-to-end on GPU.
Benchmark code: https://gist.github.com/YashasSamaga/48bdb167303e10f4d07b754888ddbdcf
As #kkHarshit already mentioned it is very hard to speed up a Mask R-CNN any further.
The fastest instance segmentation model that I found is YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS).
It's perfomance is worse than Mask R-CNN or Yolact even but still very good.
I'm new to deep learning. I wanted to build an image classifier using CNN to classify clothing images. I decided to train over the fashion MNIST-dataset which is a dataset of 60,000 images. But I'm aware that training is a very heavy task.
I wanted to know how long will my PC take to train over this dataset and should I go for pre-trained models instead with a compromise of accuracy.
My PC configurations are:
- Intel Core i5-6400 CPU # 2.70 GHz
- 8GB RAM.
- NVIDIA GeForce GTX 1050 Ti.
Even though it depends on data-set size & number of EPOCS(i tried with 50 Epocs) ,here it is small that is 32x32.
So for me when i tried on a machine with
Intel Core i7-6400 CPU # 2.70 GHz
8GB RAM.
NVIDIA GeForce GTX 1050 Ti.
with image size(28x28) as provided in MNIST dataset in Tensorflow.org it took less than 5 minutes.
I was profiling the inference latency of a MobileNetV2 model (with a batch size of 20) on my GeForce GTX 1080 GPU.
The TensorFlow timeline shows as follows:
I notice that there is quite much empty space in the "stream: all Compute" line, which I think means my GPU was not always busy. What do you think could have been causing this idle time and are there any ways to improve it?