How to run inference on inception v3 trained models? - tensorflow

I've successfully trained the inception v3 model on custom 200 classes from scratch. Now I have ckpt files in my output dir. How to use those models to run inference?
Preferably, load the model on GPU and pass images whenever I want while the model persists on GPU. Using TensorFlow serving is not an option for me.
Note: I've tried to freeze these models but failed to correctly put output_nodes while freezing. Used ImagenetV3/Predictions/Softmax but couldn't use it with feed_dict as I couldn't get required tensors from freezed model.
There is poor documentation on TF site & repo on this inference part.

It sounds like you're on the right track, you don't really do anything different at inference time as you do at training time except that you don't ask it to compute the optimizer at inference time, and by not doing so, no weights are ever updated.
The save and restore guide in tensorflow documentation explains how to restore a model from checkpoint:
https://www.tensorflow.org/programmers_guide/saved_model
You have two options when restoring a model, either you build the OPS again from code (usually a build_graph() method) then load the variables in from the checkpoint, I use this method most commonly. Or you can load the graph definition & variables in from the checkpoint if the graph definition was saved with the checkpoint.
Once you've loaded the graph you'll create a session and ask the graph to compute just the output. The tensor ImagenetV3/Predictions/Softmax looks right to me (I'm not immediately familiar with the particular model you're working with). You will need to pass in the appropriate inputs, your images, and possibly whatever parameters the graph requires, sometimes an is_train boolean is needed, and other such details.
Since you aren't asking tensorflow to compute the optimizer operation no weights will be updated. There's really no difference between training and inference other than what operations you request the graph to compute.
Tensorflow will use the GPU by default just as it did with training, so all of that is pretty much handled behind the scenes for you.

Related

Why does the loss explode during training from scratch? - Tensorflow Object Detection Models

First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).

How to deploy a live learning tensor flow model in cloud?

How do I deploy a tensor flow model in cloud which can learn and update the weights when given as input . Since most of the deployment methods I saw involved model freezing which implied freezing of weights also . Is it possible or is the latter the only way ?
Freezing the model is the most compact form and lets you have a smaller inference node which you can call for just prediction and only has the necessary information to do just that.
If you want to have and model and make it available to learn online and also make inference you could have so it has all the graph loaded with the newest weights. For security save the weights from time to time. Of course you could have two programs one for inference with the latest frozen model and another one that you up from time to time to make a new training, using the last saved weights. I recommend you the second option. Hope it helps!

How to save Tf.contrib model pruning?

I have built a model and I am successfully able to prune it using tf.contrib's model pruning module with default params and sparsity as 90%, but the problem is when I run the model it still takes the same amount of execution time as of the original model, my guess is that instead of running only the pruned version, tensorflow is running the entire graph with masked weghts and thats why there is no improvement even after pruning.
So how to export the pruned model with subgraph and respective weights and use it?
The strip_pruning_vars utility might be what you're looking for.
From the read.me file: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning#adding-pruning-ops
Removing pruning ops from the trained graph
Once the model is trained, it is necessary to remove the auxiliary variables (mask, threshold) and pruning ops added to the graph in the steps above. This can be accomplished using the strip_pruning_vars utility.
Would you mind sharing your code?

Convert a model trained in CUDNNLSTM so that the trained model can be run on CPU also

I have a model trained in CUDNNLSTM, how can I use this in CPU? How can we export CUDNNLSTM variables to CPU bases weights so that the trained model can be run on CPU also.
One safe way, that always works, would be to simply save the variables to some file with appropriate format like hdf5 and manually load them again. This way you have complete control over what is happening and don't depend on the system you are using. You can even train a model in TensorFlow, save the variables and load them for use with a different library like PyTorch (assuming you defined the exact same model there).

What does freezing a graph in TensorFlow mean?

I am a beginner in NN APIs and TensorFlow.
I am trying to save my trained model in protobuff format (.pb), there are many blogs explaining how to save the model as protobuff. One thing I did not understand is what is the importance of freezing the graph before saving it as protobuff? I read that freezing coverts variable to constants, does that mean the model is not trainable anymore?
What else will freezing do on models?
What is that model loses after freezing?
Can anyone please explain or give some pointers on details of freezing?
This is only a partial answer to your question.
A freezed graph is easily optimizable. When doing inference (forward propagation) for instance you can fuse some of the layers together. This you can't do with a graph separated between variables and operations (a not frozen graph). Why would you want to fuse layers together? There are multiple reasons. Going hardware specific: it might be easier to compute a number of operations together in a group of tensors, specific to the structure of your cpu or gpu. TensorRT is a graph optimizer for instance that works starting from a frozen graph (here more info on graph optimizations done by tensorRT: https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/ ). This software does graph optimizations as well as hardware specific ones.
As far as I understand you can unfreeze a graph. I have only worked optimizing them, so I haven't use this feature. But here there is code for it: https://gist.github.com/tokestermw/795cc1fd6d0c9069b20204cbd133e36b
Here is another question that might be useful:
TensorFlow: Is there a way to convert a frozen graph into a checkpoint model?
It has not yet been answered though.
Freezing the model means producing a singular file containing information about the graph and checkpoint variables, but saving these hyperparameters as constants within the graph structure. This eliminates additional information saved in the checkpoint files such as the gradients at each point, which are included so that the model can be reloaded and training continued from where you left off. As this is not needed when serving a model purely for inference they are discarded in freezing. A frozen model is a file of the Google .pb file type.