What is difference between a regular model checkpoint and a saved model in tensorflow? - tensorflow

There's a fairly clear difference between a model and a frozen model. As described in model_files, relevant part: Freezing
...so there's the freeze_graph.py script that takes a graph definition and a set of checkpoints and freezes them together into a single file.
Is a "saved_model" most similar to a "frozen_model" (and not a saved
"model_checkpoint")?
Is this defined somewhere in docs I'm missing?
In prior versions of tensorflow we would save and restore model
weights, but this seems to be in context of a "model_checkpoint" not
a "saved_model", is that still correct?
I'm asking more for the design overview here, not implementation specifics.

Checkpoint file only contains variables for specific model and should be loaded with either exactly same, predefined graph or with specific assignment_map to load only chosen variables. See https://www.tensorflow.org/api_docs/python/tf/train/init_from_checkpoint
Saved model is more broad cause it contains graph that can be loaded within a session and training could be continued. Frozen graph, however, is serialized and could not be used to continue training.
You can find all the info here https://www.tensorflow.org/guide/saved_model

Related

TensorFlow object detection API fine tune checkpoint

I am trying to train a model from scratch ,in that case what should I put in fine_tune_checkpoint value in Protobuf file.
Nothing. It should initialize your entire model.
If for some reason that doesn't work for you, you can give it a checkpoint of something else entirely. The API builds the graph of the model, and compares between its variables to the variables in the checkpoint (both names and shapes), and every variable it doesn't find - it initializes.

Is it possible to load part of a trained model into part of a newly built model in TensorFlow?

Is it possible to load part of a trained model into part of a newly built model in TensorFlow?
I mean, for instance, some formerly trained model of no use. But part of it is still useful. And that part could be used in a newly built model. Except this part, others of the newly built model should be trained, but this part need not be retrained again. And the newly built model is quite different from the old model except that part is the same.
If this could be done, how to write such code?
One of the possibilities:
Get the code for the model you are importing
Find the last operation of the part of the graph that you want to keep
Split the code into two parts (classes or functions) at this op
Build the shared part of the graph and load the weights
Proceed adding the rest of the new graph
Run the terminal op in the newly added part
Alternatively, you may build the entire old model, have a "fork" in the middle and, when running, ignore the terminal op in the old, unused part, and use only the terminal op in the newly added part. This way, only the new branch will be trained.
If you don't have the code but only have the saved graph definition, it is a bit more complicated: you will need to find the op you want to fork in the loaded graph by name.

What does freezing a graph in TensorFlow mean?

I am a beginner in NN APIs and TensorFlow.
I am trying to save my trained model in protobuff format (.pb), there are many blogs explaining how to save the model as protobuff. One thing I did not understand is what is the importance of freezing the graph before saving it as protobuff? I read that freezing coverts variable to constants, does that mean the model is not trainable anymore?
What else will freezing do on models?
What is that model loses after freezing?
Can anyone please explain or give some pointers on details of freezing?
This is only a partial answer to your question.
A freezed graph is easily optimizable. When doing inference (forward propagation) for instance you can fuse some of the layers together. This you can't do with a graph separated between variables and operations (a not frozen graph). Why would you want to fuse layers together? There are multiple reasons. Going hardware specific: it might be easier to compute a number of operations together in a group of tensors, specific to the structure of your cpu or gpu. TensorRT is a graph optimizer for instance that works starting from a frozen graph (here more info on graph optimizations done by tensorRT: https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/ ). This software does graph optimizations as well as hardware specific ones.
As far as I understand you can unfreeze a graph. I have only worked optimizing them, so I haven't use this feature. But here there is code for it: https://gist.github.com/tokestermw/795cc1fd6d0c9069b20204cbd133e36b
Here is another question that might be useful:
TensorFlow: Is there a way to convert a frozen graph into a checkpoint model?
It has not yet been answered though.
Freezing the model means producing a singular file containing information about the graph and checkpoint variables, but saving these hyperparameters as constants within the graph structure. This eliminates additional information saved in the checkpoint files such as the gradients at each point, which are included so that the model can be reloaded and training continued from where you left off. As this is not needed when serving a model purely for inference they are discarded in freezing. A frozen model is a file of the Google .pb file type.

How to run inference on inception v3 trained models?

I've successfully trained the inception v3 model on custom 200 classes from scratch. Now I have ckpt files in my output dir. How to use those models to run inference?
Preferably, load the model on GPU and pass images whenever I want while the model persists on GPU. Using TensorFlow serving is not an option for me.
Note: I've tried to freeze these models but failed to correctly put output_nodes while freezing. Used ImagenetV3/Predictions/Softmax but couldn't use it with feed_dict as I couldn't get required tensors from freezed model.
There is poor documentation on TF site & repo on this inference part.
It sounds like you're on the right track, you don't really do anything different at inference time as you do at training time except that you don't ask it to compute the optimizer at inference time, and by not doing so, no weights are ever updated.
The save and restore guide in tensorflow documentation explains how to restore a model from checkpoint:
https://www.tensorflow.org/programmers_guide/saved_model
You have two options when restoring a model, either you build the OPS again from code (usually a build_graph() method) then load the variables in from the checkpoint, I use this method most commonly. Or you can load the graph definition & variables in from the checkpoint if the graph definition was saved with the checkpoint.
Once you've loaded the graph you'll create a session and ask the graph to compute just the output. The tensor ImagenetV3/Predictions/Softmax looks right to me (I'm not immediately familiar with the particular model you're working with). You will need to pass in the appropriate inputs, your images, and possibly whatever parameters the graph requires, sometimes an is_train boolean is needed, and other such details.
Since you aren't asking tensorflow to compute the optimizer operation no weights will be updated. There's really no difference between training and inference other than what operations you request the graph to compute.
Tensorflow will use the GPU by default just as it did with training, so all of that is pretty much handled behind the scenes for you.

Tensorflow model file size varies significantly

I am using a tensorflow framework and I have noticed that there are major variances in the size of the tensorflow model files.
For example the framework provides 2 models:
one of pretrained model to be used with fine tuning for example
and one which contains an untrained version.
They both have a size of 172.539 kb
When I apply fine tuning in my model with some minor changes in the graph (there is a module in framework for that) and save my model the size remains essentially the same: 178.525 kb.
First, I am bit surprised that my fine-tuned model is somewhat bigger since I change just the last layer from 21 to 14 classes so I would expect a somewhat smaller model file size but since the difference is so little I didn't pay attention.
But when I trained the same model using the same model file (the pretrained one I mean) and saved the model in disk the file size is quite different: 340.097 kb. By the term train I mean I allow the network to modify all parameter not just the parameters of the last layer.
The model that is being implemented is a variation of resnet for semantic image segmentation (if can someone deduct the expected model file size from the model itself).
So, my questions are why I have such a variance in the model file sizes and how come my saved fine-tuned model is larger than the original model? Is there a way to include/exclude parameters in the model to be saved?
P.S.1 Some information that might be handy:
I am using tensorflow v2 model saving while I think the framework files use v1. I am not sure how to identify this besides the fact that the former produces 3 files.
The framework is called tensorflow-deeplab-resnet and can be found here and the models are here.
P.S.2
I am not sure stack overflow it 's the right place for this question either.
That is because, when training models and saving them, Tensorflow will also save the gradients of your ops.
So allowing training on the last layer will increase the size of your saved model a little. And allowing training on the whole model will essentially double the size of the save file because each op will have its gradients saved.