TensorFlow.Keras ModelCheckpoint Saving model while training , why? - tensorflow

I was wondering why do we need to save the model while training ?
isn't enough to save it once at the beginning of the train and then only save the weights during the train ?
I mean , The model isn't changing during the train , why this boolean is need for ?
class ModelCheckpoint(Callback):
...
save_weights_only: if True, then only the model's weights will be saved.
...
Thanks !

Its not a need or requirement, its just convenience. In a typical DL/DS workflow, you train a lot of models with different configurations and it is quite easy to get lost. Maybe you now have saved the weights for the best model but you don't remember which model configuration it was used. That information is not part of the weights and has to be recorded separately.
Then Keras provides a simple solution, to store the mode (which takes less than 10 KB) along with the weights, so in the case that you lose the original model configuration, it is still saved in the same HDF5 file.
Also consider the case where you send the model weights to someone else without the model configuration, how can you load the weights without a model? Again its just convenience.

Related

Saving and Loading CNN model

I have a CNN model and I want to save it and load it for prediction in different tab. But I am confused whether the model.evulotion part is included in the part I will save. And I don't know if it would be better to use Model.checkpoint or model.save to save and load. Is there anyone have an idea ? Thank you in advance
I'm in dilemma about using both of them so I've use it.
Using model.eval() just tells the PyTorch model to use mean values for batch normalisation, and deactivates the dropout layers. You can save your model without using model.eval() as it will not affect the performance.
While saving model, saving model's state dictionary is preferred. This can be done as shown:
#declare class of model here
model = NeuralNetwork()
#add training code below
...
#saving model, the model will be saved at the intermediateWeightPath location
intermediateWeightPath = "./bestmodel.pth"
torch.save(model.state_dict(), intermediateWeightPath)

How NOT to save model optimizer in Tensorflow Keras?

I'm reading the official tutorial on save-load in Keras and it seems whether I used save or save_weights methods, then the optimizer parameters are going to be saved at any rate. How can save model's weights only?
model.save('./savedmodel.h5', save_format='h5', include_optimizer=False)
If save_format='tf', whether include_optimizer=False or True, it's useless as I tried.
In Keras, to save model weights, do:
model.save_weights('my_model_weights.h5')
To load model weights:
model.load_weights('my_model_weights.h5')
Also see additional example on saving/loading weights by layer name from here.

Tensorflow Object-Detection API - How does the Fine-Tuning of a model works?

This is a more general question about the Tensorflow Object-Detection API.
I am using this API, to be more concrete I fine-tune a model to my dataset. According to the description of the API, I use the model_main.py function to retrain a model from a given checkpoint/frozen graph.
However, it is not clear for me how the fine-tuning is working within the API. Does a re-initialization of the last layer happen automatically or do I have to implement something like ?
In the README files I did not find any hint concerning this topic. Maybe somebody could help me.
Training from stratch or training from a checkpoint, model_main.py is the main program, besides this program, all you need is a correct pipeline config file.
So for fine-tuning, it can be separated into two steps, restoring weights and updating weights. Both steps can be customly configured according to the train proto file, this proto corresponds to train_config in the pipeline config file.
train_config: {
batch_size: 24
optimizer { }
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
fine_tune_checkpoint_type: "detection"
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 200000
data_augmentation_options {}
}
Step 1, restoring weights.
In this step, you can config the variables to be restored by setting fine_tune_checkpoint_type, the options are detection and classification. By setting it to detection essentially you can restore almost all variables from the checkpoint, and by setting it to classification, only variables from the feature_extractor scope are restored, (all the layers in backbone networks, like VGG, Resnet, MobileNet, they are called feature extractors).
Previously this is controlled by from_detection_checkpoint and load_all_detection_checkpoint_vars, but these two fields are deprecated.
Also notice that after you configured the fine_tune_checkpoint_type, the actual restoring operation would check if the variable in the graph exists in the checkpoint, and if not, the variable would be initialized with routine initialization operation.
Give an example, suppose you want to fine-tune a ssd_mobilenet_v1_custom_data model and you downloaded the checkpoint ssd_mobilenet_v1_coco, when you set fine_tune_checkpoint_type: detection, then all variables in the graph that are also available in the checkpoint file will be restored, and the box predictor (last layer) weights will also be restored. But if you set fine_tune_checkpoint_type: classification, then only the weights for mobilenet layers are restored. But if you use a different model checkpoint, say faster_rcnn_resnet_xxx, then because variables in the graph are not available in the checkpoint, you will see the output log saying Variable XXX is not available in checkpoint warning, and they won't be restored.
Step 2, updating weights
Now you have all weights restored and you want to keep training (fine-tuning) on your own dataset, normally this should be enough.
But if you want to experiment with something and you want to freeze some layers during training, then you can customize the training by setting freeze_variables. Say you want to freeze all the weights of the mobilenet and only updating the weights for the box predictor, you can set freeze_variables: [feature_extractor] so that all variables that have feature_extractor in their names won't be updated. For detailed info, please see another answer that I wrote.
So to fine-tune a model on your custom dataset, you should prepare a custom config file. You can start with the sample config files and then modify some fields to suit your needs.

How to Fix Weights of a Trained Model?

I trained, checked with my validation set that it learned pretty well, and saved the weigths (ckpt) of my DNN and now I would like to "visualize my classes" (like here : https://www.auduno.com/2015/07/29/visualizing-googlenet-classes/.) by "Instead of using backpropagation to optimize the weights, which we do during training, we keep the weights fixed and instead optimize the input pixels."
Is it possible to restore my model as non trainable ?
If not, is there a way to circumvent this and use my model to it anyway ?
Thank you :)

What is the difference between saving a summary and saving the model in the logdir?

Using Tensorflow (tf.contrib.slim in particular) we are required to calibrate a few parameters to produce the graphs that we want at tensorboard.
Saving a summary interval is more clear for us what it does. It saves the value (or an average of them?) of a particular point in the graph at the interval provided.
Now checkpoints for saving the model itself why should be required at the training process? Does the model changes?.. Not sure how this works
You save the model to checkpoints because the Variables in the model, including neural network weights and biases and the global_step counter, keep changing during the training process. The structure of the model doesn't change. The saved checkpoints allow you to load the trained model for serving and to resume training later.