TensorFlow Kitti-trained models: Detailed underlying training procedure - tensorflow

for my ML project I want to use the faster_rcnn_resnet101_kitti model from tensorflow model zoo. As the number of images in the Kitti dataset is extremely small (about 7000 images) for a deep learning practice, I was wondering how this small amount of data leads to the decent inference performance (mAP#0.5=87)? One answer I can imagine is that the network was first trained on a different, rich dataset and fine tuned on the Kitti but I am not sure about it.
I am wondering how can I find out the exact underlying training procedure (apart from pipeline.config) for the models published on TF model zoo?
Thanks

Related

How to create a dataset for image classification

I trained a model using images I gathered from the web. Then, when inferences were made using images newly collected from the web, performance was poor.
I am wondering how I can improve my dataset using misclassified images. Can I add all the misclassified images to the training dataset? And then do I have to collect new images?
[Edit]
I added some of the misclassified images to the training dataset, although the performance evaluation got better.
It might be worth if you could provide more info on how you trained your model, and your network architecture.
However this are some general guidelines:
You can try to diversify your images in your train set by, yes, adding new images. The more different examples you provide to your network, the higher the chance that they will be similar to images you want to obtain prediction from.
Do data augmentation, it is pretty straightforward and usually improves quite a bit the accuracy. You can have a look at this Tensorflow tutorial for Data Augmentation. If you don’t know what data augmentation is, basically is a technique to perform minor changes to your images, that is by rotating the image a bit, resizing etc. This way the model is trained to learn your images even with slight changes, which usually makes it more robust to new images.
You could consider doing Transfer Learning. The main idea here is to leverage a model that has learned on a huge dataset and use it to fine-tune your specific problem. In the tutorial I linked they show the typical workflow of transfer learning, by taking a model pretrained on the ImageNet dataset (the huge dataset), and retraining it on the Kaggle "cats vs dogs" classification dataset (a smaller dataset, like the one you could have).

Issues in padding(pre-processing) of huggingface gpt2 transformer model and issues with very large dataset during model training

Objective: I am trying to train a Tensorflow Huggingface GPT2 model (language model training from scratch)
Model Description:
Huggingface GPT2 Tensorflow Model
Attached a pic of config. Model Config
Dataset Description:
I have a large dataset (~20GB),
the data is separated into multiple text files with each new line as a training example.
I am facing two issues.
The examples are of different length and I am not sure how to make all the example sizes of same length to feed to the model.
Solutions Tried: We can either pad them, but then I am not sure how to do that in batches in Tensorflow. I searched about data-collator
Doubt: Padding would have to be done to make all the examples of equal size in the batch or across the whole dataset. And would this be with tokens or some other information. (Different Data Collators for Language Modelling etc.)
Since the data is very large, it cannot be loaded in memory at once while training. (Doing model.fit). For that I am not sure how to proceed.
Solutions: I am thinking of training and saving the model on small files but that would require manual intervention or for looping and the model would not be trained on the whole dataset in one go, so if there are other alternatives. Help would be really appreciated.

Variation in total loss while training the Faster RCNN model using customized data

I am working on object detection model to identify two classes. I am using Faster RCNN on customized dataset in tensorflow api. The dataset contains 20k images (augmented) with two classes. While training the model the loss is not decreasing properly as it reach to 100k steps. It has lot of variation as shown in image. Can someone tell me where i am making mistake.
enter image description here

When should I stop the object detection model training while mAP are not stable?

I am re-training the SSD MobileNet with 900 images from the Berkeley Deep Drive dataset, and eval towards 100 images from that dataset.
The problem is that after about 24 hours of training, the totalloss seems unable to go below 2.0:
And the corresponding mAP score is quite unstable:
In fact, I have actually tried to train for about 48 hours, and the TotoalLoss just cannot go below 2.0, something ranging from 2.5~3.0. And during that time, mAP is even lower..
So here is my question, given my situation (I really don't need any "high-precision" model, as you can see, I pick 900 images for training and would like to simply do a PoC model training/predication and that's it), when should I stop the training and obtain a reasonably performed model?
indeed for detection you need to finetune the network, since you are using SSD, there are already some sources out there:
https://gluon-cv.mxnet.io/build/examples_detection/finetune_detection.html (This one specifically for an SSD Model, uses mxnet but you can use the same with TF)
You can watch a very nice finetuning intro here
This repo has a nice fine tuning option enabled as long as you write your dataloader, check it out here
In general your error can be attributed to many factors, the learning rate you are using, the characteristics of the images themselves (are they normalized?) If the ssd network you are using was trained with normalized data and you don't normalize to retrain then you'll get stuck while learning. Also what learning rate are they using?
From the model zoo I can see that for SSD there are models trained on COCO
And models trained on Open Images:
If for example you are using ssd_inception_v2_coco, there is a truncated_normal_initializer in the input layers, so take that into consideration, also make sure the input sizes are the same that the ones you provide to the model.
You can get very good detections even with little data if you also include many augmentations and take into account the rest of the things I mentioned, more details on your code would help to see where the problem lies.

Low validation accuracy after mobilenet transfer learning

I need a tensorflow model which recognizes a dog's breed. I downloaded the Stanford Dogs Dataset - 20,580 images in 120 categories (=breeds). I followed the procedure described in TensorFlow For Poets to retrain mobilenet_1.0_224. I used --how_many_training_steps=4000 and defaults for everything else. I got this tensorboard graph:
Training and validation accuracy
The validation accuracy is only about 80%.
What can I do to improve it?
In the research paper MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, the test accuracy using the 'MobileNet_1.0_224' architecture on the Stanford Dogs dataset is 83.3%, which seems in line with your results.
When you visually examine the Stanford Dogs Dataset you will find a lot of the breeds look similar, which makes it hard to reach a higher accuracy, even with the state of the art image classifiers in accuracy. You might improve your results by either splitting similar looking breeds into larger subcategories.
Alternatively, you might tweak the training settings of the retrain.py script in the Tensorflow for Poets tutorial, but the gains will be likely be marginal.