number of images for training YOLOv3 on custom dataset - object-detection

I going to train YOLOv3 on my own custom dataset following the instructions found on the Darknet github repo:
https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
I have so far manually labeled 300 images per object class and I want to ask what is the minimum number of labeled images per class needed to have a good model performance?( as the manual labeling takes much time )

Related

Keras - Changing the way images are taken in a dataset in fit()

I'm studying machine learning and deep learning. I'm trying to customize this model from the Keras website https://keras.io/examples/generative/wgan_gp/
My model takes 3 512x512 images in each training iteration (from 10 different directories), which are then divided into patches used to train the generator and discriminator. These images must be consecutive and belong to the same directory. The directory can be chosen randomly in each iteration, and the 3 images must be taken from it.
In summary, for each training iteration, the algorithm must select a random directory, take 3 consecutive images and divide them into patches to train the two networks.
How can I customize the way I iterate over the dataset in fit() to achieve this?
Providing solution here from the answer provided by Shubham Panchal in comment section for the benefit of the community.
You can do this using TensorFlow. See this tutorial on DCGAN. With the TensorFlow API, you can create a custom training loop with any existing Keras model. You may implement the custom training loop from the tutorial above and using use the WGAN model you have.

Variation in total loss while training the Faster RCNN model using customized data

I am working on object detection model to identify two classes. I am using Faster RCNN on customized dataset in tensorflow api. The dataset contains 20k images (augmented) with two classes. While training the model the loss is not decreasing properly as it reach to 100k steps. It has lot of variation as shown in image. Can someone tell me where i am making mistake.
enter image description here

Tensorflow real time object detection

I am making a real time object detector as my project . I have the following doubts :
1) how many images of each item should I take to train accurately ?
2) will the model which has earlier been trained on different objects detect those objects if I used that to train other objects ?
3) which object detector model should I use ?
1) With tensorflow you can start with 150-200 images of each class to start testing with some decent initial results. You may have to increase the images based on results
2) Yes
3) You could start with any of the models, like ssd_mobilenet_v1_coco
Here are all of the models available which are trained on COCO dataset
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
Each of the pre-trained model is different from others in terms of speed of detection, accuracy etc., Based on your needs you need to pick
Additionally Seems you are new to Obeject detection, refer the following articles if you need a start on how to do
https://pythonprogramming.net/training-custom-objects-tensorflow-object-detection-api-tutorial/
https://towardsdatascience.com/building-a-toy-detector-with-tensorflow-object-detection-api-63c0fdf2ac95
https://medium.com/#dana.yu/training-a-custom-object-detection-model-41093ddc5797

Tensorflow object detection: why is the location in image affecting detection accuracy when using ssd mobilnet v1?

I'm training a model to detect meteors within a picture of the night sky and I have a fairly small dataset with about 85 images and each image is annotated with a bounding box. I'm using the transfer learning technique starting with the ssd_mobilenet_v1_coco_11_06_2017 checkpoint and Tensorflow 1.4. I'm resizing images to 600x600pixels during training. I'm using data augmentation in the pipeline configuration to randomly flip the images horizontally, vertically and rotate 90 deg. After 5000 steps, the model converges to a loss of about 0.3 and will detect meteors but it seems to matter where in the image the meteor is located. Do I have to train the model by giving examples of every possible location? I've attached a sample of a detection run where I tiled a meteor over the entire image and received various levels of detection (filtered to 50%). How can I improve this?detected meteors in image example
It could very well be your data and I think you are making a prudent move by improving the heterogeneity of your dataset, BUT it could also be your choice of model.
It is worth noting that ssd_mobilenet_v1_coco has the lowest COCO mAP relative to the other models in the TensorFlow Object Detection API model zoo. You aren't trying to detect a COCO object, but the mAP numbers are a reasonable aproximation for generic model accuracy.
At the highest possible level, the choice of model is largely a tradeoff between speed/accuracy. The model you chose, ssd_mobilenet_v1_coco, favors speed over accuracy. Consequently, I would reccomend you try one of the Faster RCNN models (e.g., faster_rcnn_inception_v2_coco) before you spend a signifigant amount of time preprocessing images.

Retraining Inception and Downsampling

I have followed this Tensorflow tutorial on transfer learning with the Inception model using my own dataset of 640x360 images. My question comes in 2 parts
1) My data set conatains 640x360 images. Is the first operation that happens a downsampling to 299x299? I ask because I have a higher res version of the same dataset and I am wondering if training with the higher resolution images will result in different performance (hopefully better)
2) When running the network (using tf.sess.run()) is my input image down-sampled to 299x299?
Note: I have seen the 299x299 resolution stat listed many places online like this one and I am confused at exactly which images its referring to; the initial training dataset images (for Inception I think it was imagenet), the transfer learning dataset (my personal one), the input image when running the CNN, or a combination of the 3.
Thanks in advance :)
The inception model will resize your image to 299x299. This can be confirmed by visualizing the tensorflow graph. If you have enough samples to do the transfer learning, the accuracy will be good enough with resizing to 299x299. But if you really want to try out the training with actual resolution, the initial input layers of the graph size needs to be changed