Is there a way to reduce the ram usage when training with "dlib.train_simple_object_detector"? - object-detection

I've been trying to use "dlib.train_simple_object_detector" to make a detector to detect pedestrians. After a few hours of making rectangles in imglab I made the Inria person dataset consisting of about 600 into an xml file. But when I try to train on this data I get "MemoryError: bad allocation" I have 16GB ram but I guess that's not enough here.
So I reduce the amount of training images and find that it onlys starts working when there are about 100 images.
When I try this detector on test images the detection rate is pretty bad so I would really like to train on all the images or atleast on more of them than 100.
So my questions are:
Is there a way to reduce the ram usage when training with "dlib.train_simple_object_detector"?
Is there an already trained .svm file somewhere that is trained for pedestrian detection that I can use insead?
Thanks for any help you can give!

Related

How to train large datasets in Colab free

I have to train 70,000 images for my face verification project on google colab free.
First, it gets stuck on 1st epoch and then even if it starts training, after sometime it throws out of RAM error.
The code I use is:
<https://nbviewer.org/github/nicknochnack/FaceRecognition/blob/main/Facial%20Verification%20with%20a%20Siamese%20Network%20-%20Final.ipynb>
If I've to make mini-batches of my dataset to fit it in the colab's GPU memory, then how can I do it?
Also, I want to train the whole dataset because it contains the images of 5 different people as anchors and positives.
You can do following options to train larger datasets.
Add more pooling layers in model.
Lower input size in your model.
Use Binary Format of images with lower image size for image classification models.
Lower the batch size while training and validating your model.
You can also use tf.data api to do various operations like batching , slicing , processing, shuffling etc to create a data pipeline. You can constrain GPU usage further to avoid Out of memory issues.
Attaching sample colab notebook below. https://colab.sandbox.google.com/github/tensorflow/docs/blob/master/site/en/guide/data.ipynb

Training SSD-MOBILENET V1 and the loss does not deacrease

I'm new in everithing about CNN and tensorflow. Im training a pretrained ssd-mobilenev1-pets.config to detect columns of buildings, about one day but the loss is between 2-1 and doesnt decrease since 10 hours ago.
I realized that my input images are 128x128 and SSD resize de image to 300*300.
Does the size of the input images affect the training?
If that is the case, should I retrain the network with larger input images? or what would be another option to decrease the loss? my train dataset has 660 images and test 166 I dont Know if there are enough images
I really aprecciate your help ....
Loss values of ssd_mobilenet can be different from faster_rcnn. From EdjeElectronics' TensorFlow Object Detection Tutorial:
For my training on the Faster-RCNN-Inception-V2 model, it started at
about 3.0 and quickly dropped below 0.8. I recommend allowing your
model to train until the loss consistently drops below 0.05, which
will take about 40,000 steps, or about 2 hours (depending on how
powerful your CPU and GPU are). Note: The loss numbers will be
different if a different model is used. MobileNet-SSD starts with a
loss of about 20, and should be trained until the loss is consistently
under 2.
For more information: https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#6-run-the-training
The SSD Mobilnet architecture demands additional training to suffice
the loss accuracy values of the R-CNN model, however, offers
practicality, scalability, and easy accessibility on smaller devices
which reveals the SSD model as a promising candidate for further
assessment (Fleury and Fleury, 2018).
For more information: Fleury, D. & Fleury, A. (2018). Implementation of Regional-CNN and SSD machine learning object detection architectures for the real time analysis of blood borne pathogens in dark field microscopy. MDPI AG.
I would recommend you to take 15%-20% images for testing which cover all the variety present in training data. As you said you have 650+ images for training and 150+ for testing. That is roughly 25% of testing images. It looks like you have enough images to start with. I know the more, the merrier but make sure your model also has sufficient data to learn from!
Resizing the images does not contribute to the loss. It makes sure there is consistency across all images for the model to recognize them without bias. The loss has nothing to do with image resizing as long as every image is resized identically.
You have to make stops and recover checkpoints again and again if you want your model to be perfectly fit. Usually, you can get away with good accuracy by re-training the ssd mobilenet until the loss consistently becomes under 1.Ideally we want the loss to be as lower as possible but we want to make sure the model is not over-fitting. It is all about trial and error. (Loss between 0.5 and 1 seems to be doing the job well but again it all depends on you.)
The reason I think your model is underperforming is due to the fact that you have variety of testing data and not enough training data to suffice.
The model has not been given enough knowledge in training data to make the model learn for new variety of testing data. (For example : Your test data has some images of new angles of buildings which are not sufficiently present in training data). In that case, I recommend you to put variety of all images in training data and then picking images to test making sure you still have sufficient training data of new postures. That's why I recommend you to take 15%-20% test data.

What is the computational power required for training High resolution (4024 x 3036) images using VGG16-Net?

I am working on a classification of high-resolution images using VGG16-Net in keras.
But I am unable to use images of size beyond (600 x 600) resolution for training using Batch size 1 on nVIDIA GeForce GTX 1080 GPU,
I am facing Resource Exhaustion error OOM i.e unable to allocate tensor of the shape [18, 64, 600, 600].
Can anyone please suggest me any solution for this?
I want to use the large size images since I am labeling the images as Good and Bad based on the very small difference.
Thanks in advance!!
The whole network plus batch data need to be able to fit into VRAM. If you really do need to use high resolution images then you need to use a smaller network.
vgg-16 is old and inefficient anyway, not recommended for a new project. You could lookup things like mobilenetv2 or mnasnet but bare in mind that all of these commonly used models are generally optimized for around 600x600 or much smaller. Out of interest, I have tried training CNNs on very high resolution images, just to see what would happen and I found that of course, they train and run painfully slowly with much reduced accuracy - if all of the features in the images are very large with respect to the convolutional filters, then the filters won't be able to pick up on them.

VGGNet fine-tuning: image size, time and epoch

I am new in deep learning, I would like to have some clarification to understand the deep learning.
I want to train my CNN model using VGGNet, should I use the same image size of VGGNet 224x224 input images for fine-tuning?
How can I determine the number of epoch?
How would it takes to train the model if the number of images is around 5000 images and the GPU Nvidia gtx 1070?
Please help me to find answers.
Thank you
IT
I consider number of epoch is how many times you loop through entire data. Actually number of epoch does not help you decide when or where the network will converge. It depends a lot on your algorithm and your data-themselves. So it is OK to ignore it.
Your most important target is to find the model which have the best accuracy. In my experience, just let the machine run for 20 epochs. Then draw the loss-accuracy graph and select the model.
For example, in the graph below. I often choose model which is saved from red-colored range. Which help me avoid over-fitting.
(The image is copied from internet)
Hope that help.

Does the amount of data affect the recognition speed?[tensorflow]

Hello, I have a tensorflow model, I coached it in about 300 pictures, the training took place in the Google cloud, I recognize it there and send a response to my server. The question is, does the amount of data affect the recognition speed, that is, will it be recognized faster if I train 3,000 pictures?
No, your recognition speed is dependant only on your computation graph.
The accuracy of recognition is dependent on the training images used.