Our YOLOv4-tiny suddenly loses accuracy - yolo

Im training yolov4 tiny custom dataset, and suddenly loss and other markers drop to -nan
As you can see on the chart, all progress is lost after some iterations (around 800 iterations).
Yolov4 accuracy chart
Training log for given chart:
Darknet training log
Any ideas on given problem? It is running on ubuntu with 4 x GeForce GTX 1080 6GB.
When testing the same network on PC with single GeForce GTX 1060 6GB, it does not crash.

Related

Yolov4 Training Time

Is anyone aware of how long the training took to achieve the mAP and FPS of YoloV4 on the MS COCO dataset as referenced in https://github.com/AlexeyAB/darknet and the corresponding paper: https://arxiv.org/abs/2004.10934.
Trying to estimate training time and final mAP for RTX 3050 without running the full training, as it is projected to take ~1500 hours.
Have not been able to find any stats for how long the training took for the Tesla V100 and RTX 2070 referenced in the paper.
Ideally could take length of time RTX 2070 took and scale according to the difference in BFLOPS, and assume accuracy is roughly similar.

Why a 2070 Max-Q takes longer time for training a Neural Network than a GTX 960m?

I have two laptops, both with Windows 10, that I use for work:
MSI GE70: i7 4720, 12 GB Ram, GTX 960m 2GB, 258 GB SSD.
Dell G7: i7 9750, 32 GB Ram, RTX 2070 Max-Q 8Gb, 500 GB SSD.
I made a 'mirror' installation of TensorFlow in both laptops following the official TensorFlow page.
In both laptops I installed Python 3.6.8, TensorFlow 2.2, CUDA 10.1, cuDNN 7.6 and 456.71 Nvidia Driver version. When I run the following line in CMD I can see that both GPUs are visibles to TensorFlow and ready to use.
python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))
MSI with 960m
Dell with 2070 Max-Q
Then, when I train the same Neural Network in both laptops, I can see that the MSI takes 7 minutes per epoch, while the Dell G7 takes almost an one hour per epoch. Why the GPU 2070 Max-Q takes so longer time for train the Neural Network in comparison with the 960m? There is some problem with the Dell G7 that I can't see?
This is the structure of the NN:
modelo=Sequential()
modelo.add(Bidirectional(LSTM(units=na, return_sequences=True),input_shape=dim_entrada))
modelo.add(Dropout(0.25))
modelo.add(Bidirectional(LSTM(units=na)))
modelo.add(Dropout(0.25))
modelo.add(Dense(units=3))
opt = tf.optimizers.Adam(learning_rate=0.0015)
modelo.compile(optimizer=opt, loss='mse', metrics=['accuracy'])
modelo.fit(X_train,Y_train,epochs=20,batch_size=32,validation_data=(X_validacion_imu12,Y_validacion_vi12))
I found the problem. I don't know why but the Dell G7 must be plugged into electricity. I think is a power option that prevents the use of the GPU if it is not plugged in.

How long does it take to train over the fashion-MNIST database?

I'm new to deep learning. I wanted to build an image classifier using CNN to classify clothing images. I decided to train over the fashion MNIST-dataset which is a dataset of 60,000 images. But I'm aware that training is a very heavy task.
I wanted to know how long will my PC take to train over this dataset and should I go for pre-trained models instead with a compromise of accuracy.
My PC configurations are:
- Intel Core i5-6400 CPU # 2.70 GHz
- 8GB RAM.
- NVIDIA GeForce GTX 1050 Ti.
Even though it depends on data-set size & number of EPOCS(i tried with 50 Epocs) ,here it is small that is 32x32.
So for me when i tried on a machine with
Intel Core i7-6400 CPU # 2.70 GHz
8GB RAM.
NVIDIA GeForce GTX 1050 Ti.
with image size(28x28) as provided in MNIST dataset in Tensorflow.org it took less than 5 minutes.

Why does my TensorFlow profiling timeline show idle times on GPU?

I was profiling the inference latency of a MobileNetV2 model (with a batch size of 20) on my GeForce GTX 1080 GPU.
The TensorFlow timeline shows as follows:
I notice that there is quite much empty space in the "stream: all Compute" line, which I think means my GPU was not always busy. What do you think could have been causing this idle time and are there any ways to improve it?

fine_tune faster_resnet101_coco by GTX 1080

Is it possible to fine-tuning faster_rcnn_resnet101_coco by GTX 1080 with object detection api? Or faster_rcnn_nasnet.
I'm not sure how much VRAM a 1080 has, but you can train a faster rcnn resnet 101 model on a 1080ti with 11GB RAM. Eyeballing the GPU usage there it should roughly fit 8GB with batch size 1, so I would say yes, you can finetune a RCNN resnet101 object detector using the object detection api.