mAP decreases on re-training tensorflow object detection - tensorflow

I tried to train the tensorflow object detection API on the pascalVOC dataset (for all 20 classes) using the faster_rcnn_resnet101 (trained on coco dataset with 91 classes) model as base model (provided in the repo). I used the config provided in the repo.
The training generated several .ckpt files at several iteration points including the ckpt files at iteration 0 (on which the training has no effect).
Then I performed the evaluation on the .ckpt files using the pascalVOC validation data for all 20 classes.
Important note: the 20 pascalVOC categories are actually a subset of the 91 Coco categories.
My evaluation results show that the performance for the 20 classes goes down by 15% and more for all iterations in the training. As per the config I ran the training for 200K iterations and kept all hyper-parameters the same as in the original graph.
My question is: Can anyone explain what is wrong here? Why does the performance (mAP) go down even though the pascalVOC categories are a subset of the Coco categories.
Also how can I ensure that the performance for these 20 classes do not go down even after subsequent training on datasets belonging to this class.
Or else please let me know if I am missing something.

Related

variational autoencoder with limited data

Im working on a binary classificaton project, and im using VAE (variational autoencoder) to handle the imbalance between the 2 classes by generating new samples for the minority class.
the first class (majority class) contains 20000 samples, and the second one (minority class) contains 500 samples.
After training VAE model on the minority class, i generated new samples for this class and add them to the training set, then i trained two classification models, a model on trained on the imbalanced data (only training set) and the second one trained with training set + data generated by VAE). The problem is the first model is giving results better than the second(f1-score, Roc auc...), and i thought that maybe the problem was because of the limited amount of data that the VAE was trained on.
Any help please.
Though 500 training Images are not good enough to generate diversified images from a VAE, you can still try producing some. It's better to take mean of latents of 10 different images (or even more) and pass it through the decoder ( if you're already doing this, ignore it. If you're doing some other method, try this).
If it's still not working, then, I suggest you to build a Conditional VAE on your entire dataset. In conditional VAE, you train VAE using the labels so that your models learns not only reconstruction but also what class of image it is reconstructing. This helps you to generate an Image of any particular class.

Multiple BERT binary classifications on a single graph to save on inference time

I have five classes and I want to compare four of them against one and the same class. This isn't a One vs Rest classifier, as for each output I want to score them against one base class.
The four outputs should be: base class vs classA, base class vs classB, etc.
I could do this by having multiple binary classification tasks, but that's wasting computation time if the first layers are BERT preprocessing + pretrained BERT layers, and the only differences between the four classifiers are the last few layers of BERT (finetuned ones) and the Dense layer.
So why not merge the graphs for more performance?
My inputs are four different datasets, each annotated with true/false for each class.
As I understand it, I can re-use most of the pipeline (BERT preprocessing and the first layers of BERT), as those have shared weights. I should then be able to train the last few layers of BERT and the Dense layer on top differently depending on the branch of the classifier (maybe using something like keras.switch?).
I have tried many alternative options including multi-class and multi-label classifiers, with actual and generated (eg, machine-annotated) labels in the case of multiple input labels, different activation and loss functions, but none of the results were acceptable to me (none were as good as the four separate models).
Is there a solution for merging the four different models for more performance, or am I stuck with using 4x binary classifiers?
When you train DNN for specific task it will be (in vast majority of cases) be better than the more general model that can handle several task simultaneously. Saying that, based on my experience the properly trained general model produces very similar results to the original binary ones. Anyways, here couple of suggestions for training strategies (assuming your training datasets for each task are completely different):
Weak supervision approach
Train your binary classifiers, and label your datasets using them (i.e. label with binary classifier trained on dataset 2 datasets [1,3,4]). Then train your joint model as multilabel task using all the newly labeled datasets (don't forget to randomize samples before feeding them to trainer ;) ). Here you will need to experiment if you will use threshold and set a label to 0/1 or use the scores of the binary classifiers.
Create custom loss function that will not penalize if no information provided for certain class. So when your will introduce sample from (say) dataset 2, your loss will be calculated only for the 2nd class.
Of course you can apply both simultaneously. For example, if you know that binary classifier produces scores that are polarized (most results are near 0 or 1), you can use weak labels, and automatically label your data with scores. Now during the second stage penalize loss such that for score x' = 4(x-0.5)^2 (note that you get logits from the model, so you will need to apply sigmoid function). This way you will increase contribution of the samples binary classifier is confident about, and reduce that of less certain ones.
As for releasing last layers of BERT, usually unfreezing upper 3-6 layers is enough. Releasing more layers improves results very little and increases time and memory requirements.

Tensorflow object detection api mis classifying objects

I followed a simple tutorial to train a custom object detector.
I got my loss up to 0.6, however my issue is that the detected will classify other objects as what I've trained it with. For example in my case it classifies a dog as macarooni and cheese.
What am I doing wrong ?
I faced exactly the same issue, where the model "remembered" the previous objects. There is a new configuration in the config file that is was not implemented when the video was made.
Inside the ssd_mobilenet_v1_pet.config file you have to specify the path to the checkpoint where the training will start, so it will have all the weights from the previous training, this config is fine_tune_checkpoint, below that there is from_detection_checkpoint so it will use the specified checkpoint, after that there is load_all_detection_checkpoint_vars which is set to true by default, but must be false if you want the model to "forget" the objects that it was trained on.
The problem is that load_all_detection_checkpoint_vars will load and fix all the weights, including the ones in the final layers not just the lower layer ones, so it will remember the classification and detection from past objects and misclassify with the new ones, since your *.pbtxt has different classifications. If you set it to false it will load the data and learn new weights for the final layers based only on your training set.

subset of any pre-trained model in tensorflow

Can we take subset of any pre-trained model in tensorflow? For example, if we have a pre-trained model which can detect 545 obejcts, can we make a subset of this model which can detect only 20 objects so that the time taken to load the model as well as the detection process can be reduced.
The best you can do is reduce the weights that are related to the last (output) layer only. So, if the size of your second last layer is 1000 so it will reduce your parameters by (1000 * 545 - 1000 * 20) = 525000.
But, if your network is very deep this won't prove to be a great speedup, as you will still need to calculate all the other layers except the last one.
You could, but it's a non-negligible amount of work, and it won't substantially improve the speed.
Indeed, what you'd need to change is only the class prediction layer, which you'd have to reduce from n_featuresx545 to n_featuresx20. Typically at that stage you have n_features=7*7=49 (although it actually depends on the method you're using; this is true for Faster RCNN with usual settings), so you'd save approximately 26k parameters and 8million operations per image (considering 300 detections per image), which is negligible compared to the millions of parameters and billions of operations usually involved in object detection models.
And changing the prediction layer without retraining and while keeping the trained values is not staight-forward, you'd have to write a piece of code to modify your network manually.

when to stop training object detection tensorflow

I am training faster rcnn model on fruit dataset using a pretrained model provided in google api(faster_rcnn_inception_resnet_v2_atrous_coco).
I made few changes to the default configuration. (number of classes : 12 fine_tune_checkpoint: path to the pretrained checkpoint model and from_detection_checkpoint: true). Total number of annotated images I have is around 12000.
After training for 9000 steps, the results I got have an accuracy percent below 1, though I was expecting it to be atleast 50% (In evaluation nothing is getting detected as accuracy is almost 0). The loss fluctuates in between 0 and 4.
What should be the number of steps I should train it for. I read an article which says to run around 800k steps but its the number of step when you train from scratch?
FC layers of the model are changed because of the different number of the classes but it should not effect those classes which are already present in the pre-trained model like 'apple'?
Any help would be much appreciated!
You shouldn't look at your training loss to determine when to stop. Instead, you should run your model through the evaluator periodically, and stop training when the evaluation mAP stops improving.