Tensorflow object detection: Continue training - tensorflow

Lets say I train a pretrained network like ResNet and set it to detection in the pipeline.config file for the fine_tune_checkpoint_type attribute. As far as I understand this means that we take the pretrained weights of the model, except for the classification and box prediction heads. Further this means that we can create our own type of labels which will then result as the classification and box prediction heads for the model we want to create/train.
Now, let's say I train this network for 25000 steps and want to continue training later on without the model forgetting anything. Should I change the fine_tune_checkpoint_type in the pipeline.config to full in order to continue the training (and of course load the correct checkpoint file) or should I still let it be set as detection?
Edit:
This is based on the information found here https://github.com/tensorflow/models/blob/master/research/object_detection/protos/train.proto:
// 1. "classification": Restores only the classification backbone part of
// the feature extractor. This option is typically used when you want
// to train a detection model starting from a pre-trained image
// classification model, e.g. a ResNet model pre-trained on ImageNet.
// 2. "detection": Restores the entire feature extractor. The only parts
// of the full detection model that are not restored are the box and
// class prediction heads. This option is typically used when you want
// to use a pre-trained detection model and train on a new dataset or
// task which requires different box and class prediction heads.
// 3. "full": Restores the entire detection model, including the
// feature extractor, its classification backbone, and the prediction
// heads. This option should only be used when the pre-training and
// fine-tuning tasks are the same. Otherwise, the model's parameters
// may have incompatible shapes, which will cause errors when
// attempting to restore the checkpoint.
So, the classification only provides the classification backbone part of the feature extractor. This means that the model will start from scratch on many parts of the network.
detection restores the whole feature extractor but the "end result" will be forgotten, which means we can add our own classes and start learning these classifications from scratch.
full restores everything, even the classes and box prediction weights. However, this is fine as long as we do not add or remove any classes/labels.
Is this correct?

Yes, you have understood that correctly.
set fine_tune_checkpoint_type: full in piepline.config to retain all that model has learnt till the last checkpoint.

Yes, you can config the variables to be restored by setting fine_tune_checkpoint_type, the options are detection and classification. By setting it to detection essentially you can restore almost all variables from the checkpoint, and by setting it to classification, only variables from the feature_extractor scope are restored, (all the layers in backbone networks, like VGG, Resnet, MobileNet, they are called feature extractors).
Click here for more information.

Related

Which layers are frozen using Tensorflow 2 Object detection API?

How can I understand which layers are frozen fine-tuning a detection model from Tensorflow Model Zoo 2?
I have already set with success the Path for fine_tune_checkpoint and fine_tune_checkpoint_type: detection and in the file proto I have already read that "detection" means
// 2. "detection": Restores the entire feature extractor.
The only parts of the full detection model that are not restored are the box and class prediction heads.
This option is typically used when you want to use a pre-trained detection model
and train on a new dataset or task which requires different box and class prediction heads.
I didn't really understand what does that means. Restored means Frozen in this context?
As I understand it, currently the Tensorflow 2 Object detection does not freeze any layers when training from a fine tune checkpoint. There is a issue reported here to support specifying which layers to freeze in the pipeline config. If you look at the training step function, you can see that all trainable variables are used when applying gradients during training.
Restored here means that the model weights are copied from the checkpoint to be used as a starting point for training. Frozen would mean that the weights are not changed (i.e. no gradient is applied) during training.

Tensorflow Object Detection API - "fine_tune" vs "detection" vs "classification"

I am following this tutorial: https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html
In it, it has the following snippet in the file pipeline.config:
fine_tune_checkpoint_type: "detection" # Set this to "detection" since we want to be training the full detection model
Further investigation leads to the following discoveries:
There are at least 3 options for the field fine_tune_checkpoint_type - fine_tune,detection and classification
Not all models from the model zoo allow all options.
My questions are:
What do each of fine_tune,detection and classification mean in this context, and more importantly when is it appropriate to use each one.
How do I tell which options are compatible with models in the model zoo?
Ultimately I wish to do transfer learning - e.g. take an existing trained model and train it to draw boxes for one or more novel classes.
Those options indicates how to restore checkpoints and comes from here
I copy here the interesting part:
This option controls how variables are restored from the (pre-trained) fine_tune_checkpoint. For TF2 models, 3 different types are supported:
"classification": Restores only the
classification backbone part of the feature extractor. This option is typically used when you want to train a detection model starting from a pre-trained image classification model, e.g. a ResNet model pre-trained on ImageNet.
"detection": Restores the entire feature extractor. The only parts of the full detection model that are not restored are the box and class prediction heads. This option is typically used when you want to use a pre-trained detection model and train on a new dataset or task which requires different box and class prediction heads.
"full":Restores the entire detection model, including the feature extractor, its classification backbone, and the prediction heads. This option should only be used when the pre-training and fine-tuning tasks are the same. Otherwise, the model's parameters may have incompatible shapes, which will cause errors when attempting to restore the checkpoint. For more details about this parameter, see the restore_map (TF1) or restore_from_object (TF2) function documentation in the /meta_architectures/*meta_arch.py files.
I guess fine_tune is currently replaced by "full". Based on your needs the right choice appear to be "detection". To know which models supports wich options, as indicated above you have to look at the restore_from_object function definition in the proper /meta_architectures/*meta_arch.py files

Why does the loss explode during training from scratch? - Tensorflow Object Detection Models

First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).

How to modify freezed layers in training using Tensorflow's Object Detection API?

I am using Tensorflow's Object Detection API in training.
In which file, the freezed layers are defined to fine-tune the model in training.
I need to experiment changing freezed layers in fine-tuning.
For example, if I use Resnet50 configuration, where I can change the freezed layers?
That certainly you can do.
By reading the proto file for training, there is a field called freeze_variables, this is supposed to be a list containing all variables that you want to freeze, e.g. excluding them during the training.
Supposed you want to freeze the weights from the first bottleneck in the first unit of the first block, you can do it by adding
freeze_variables: ["resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"]
so your config flie looks like this:
train_config: {
batch_size: 1
freeze_variables: ["resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"]
...
You can verify that the weights are in fact freezed by checking the tensorflow graph.
As shown, the weights do not have train operation anymore.
By choosing specific patterns for freeze_variables, you can freeze variables very flexibly (you can get layer names from the tensorflow graph).
Btw, here is the actual filtering operation.

How to run inference on inception v3 trained models?

I've successfully trained the inception v3 model on custom 200 classes from scratch. Now I have ckpt files in my output dir. How to use those models to run inference?
Preferably, load the model on GPU and pass images whenever I want while the model persists on GPU. Using TensorFlow serving is not an option for me.
Note: I've tried to freeze these models but failed to correctly put output_nodes while freezing. Used ImagenetV3/Predictions/Softmax but couldn't use it with feed_dict as I couldn't get required tensors from freezed model.
There is poor documentation on TF site & repo on this inference part.
It sounds like you're on the right track, you don't really do anything different at inference time as you do at training time except that you don't ask it to compute the optimizer at inference time, and by not doing so, no weights are ever updated.
The save and restore guide in tensorflow documentation explains how to restore a model from checkpoint:
https://www.tensorflow.org/programmers_guide/saved_model
You have two options when restoring a model, either you build the OPS again from code (usually a build_graph() method) then load the variables in from the checkpoint, I use this method most commonly. Or you can load the graph definition & variables in from the checkpoint if the graph definition was saved with the checkpoint.
Once you've loaded the graph you'll create a session and ask the graph to compute just the output. The tensor ImagenetV3/Predictions/Softmax looks right to me (I'm not immediately familiar with the particular model you're working with). You will need to pass in the appropriate inputs, your images, and possibly whatever parameters the graph requires, sometimes an is_train boolean is needed, and other such details.
Since you aren't asking tensorflow to compute the optimizer operation no weights will be updated. There's really no difference between training and inference other than what operations you request the graph to compute.
Tensorflow will use the GPU by default just as it did with training, so all of that is pretty much handled behind the scenes for you.