I am using faster_rcnn_resnet50_coco model from the model zoo of tensorflow. I trained the model on my own data (solar images) and as I am observing the training (in tensorboard) the total loss is constantly increasing.
This is surprising to me because the model seems to be pretty successful in detecting the objects.
Related
I am wondering that in tensorflow, if we are doing quantization aware training (QAT) by introducing fake quant nodes (using tf.contrib.quantize.create_training_graph() method), and after finishing the training process, can we do inference on the quantized output while not using tf.contrib.quantize.create_eval_graph() method?
In other words, after introducing fake quantization nodes and training, is it necessary to use tf.contrib.quantize.create_eval_graph() before trained computational graph evaluation. Can we query the tensorflow graph (which has fake quantization nodes) by making a tensorflow session without using tf.contrib.quantize.create_eval_graph().
In short, what is the function of tf.contrib.quantize.create_eval_graph()?
I just finished training a yolov3 tiny model via google colab. I need the information regarding its accuracy in detection. How do I perform the evaluation for this model in terms of confusion matrix?
How can I understand which layers are frozen fine-tuning a detection model from Tensorflow Model Zoo 2?
I have already set with success the Path for fine_tune_checkpoint and fine_tune_checkpoint_type: detection and in the file proto I have already read that "detection" means
// 2. "detection": Restores the entire feature extractor.
The only parts of the full detection model that are not restored are the box and class prediction heads.
This option is typically used when you want to use a pre-trained detection model
and train on a new dataset or task which requires different box and class prediction heads.
I didn't really understand what does that means. Restored means Frozen in this context?
As I understand it, currently the Tensorflow 2 Object detection does not freeze any layers when training from a fine tune checkpoint. There is a issue reported here to support specifying which layers to freeze in the pipeline config. If you look at the training step function, you can see that all trainable variables are used when applying gradients during training.
Restored here means that the model weights are copied from the checkpoint to be used as a starting point for training. Frozen would mean that the weights are not changed (i.e. no gradient is applied) during training.
I am trying to train a Faster-RCNN network with Inception-v3 architecture (reference paper: Google's paper) as my fixed feature extractor using keras on my own dataset (number of classes = 4) which is very different compared to the Image-net. Still I initialized it with Image-net weights because this paper gives evidence that initializing with pre-trained weights is always better compared to random initialization.
Upon Training for 60 Epochs my Training accuracy is at 96% and my validation accuracy is at 84% ,Over-fit! (severe maybe?). But what is more worrying is that my loss did not converge at all. Upon testing the network it failed miserably! like, it didn't even detect.
Then I took a slightly different approach. I did a two step training. First I trained the Inception-v3 on my dataset like a classification problem (Still initialized it with Image-net weights) it converged well. Then I used those weights to initialize the Faster-RCNN network. This worked! But, I am confused why this two staged approach works but Training from scratch didn't work. Given I initialized both the methods with the pre-trained image-net weights initially.
Is there a way to train Faster RCNN from scratch?
How long does it typically take to train a Faster RCNN model on MSCOCO on a single GPU card (e.g. K40 or 1080) using the Tensorflow Object Detection API? It would be great if you could provide training times for other models (SSD and R-FCN), too.
I have never trained a model on COCO using a single GPU --- we typically train using ~10 k40 GPUs with asynchronous SGD, which takes 3-4 days to converge on COCO. SSD and R-FCN take about the same amount of time.