How can I convert TRT optimized model to saved model? - tensorflow

I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.

Assuming you have a TRT optimized model (i.e., the model is represented already in UFF) you can simply follow the steps outlined here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics. Pay special attention to section 3.3 and 3.4, since in these sections you actually build the TRT engine and then save it to a file for later use. From that point forward, you can just re-use the serialized engine (aka. a PLAN file) to do inference.
Basically, the workflow looks something like this:
Build/train model in TensorFlow.
Freeze model (you get a protobuf representation).
Convert model to UFF so TensorRT can understand it.
Use the UFF representation to build a TensorRT engine.
Serialize the engine and save it to a PLAN file.
Once those steps are done (and you should have sufficient example code in the link I provided) you can just load the PLAN file and re-use it over and over again for inference operations.
If you are still stuck, there is an excellent example that is installed by default here: /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist. You should be able to use that example to see how to get to the UFF format. Then you can just combine that with the example code found in the link I provided.

Related

PEGASUS From pytorch to tensorflow

I have fine-tuned PEGASUS model for abstractive summarization using this script which uses huggingface.
The output model is in pytorch.
Is there a way to transorm it into tensorflow model so I can use it in a javascript backend?
There are several ways in which you can potentially achieve a conversion, some of which might not even need Tensorflow at all.
Firstly, the way that does what you intend to do: PEGASUS seems to be completely based on the BartForConditionalGeneration model, according to the transformer implementation notes. This is important, because there exists a script to convert PyTorch checkpoints to TF2 checkpoints. While this script does not explicitly allow you to convert a PEGASUS model, it does have options available for BART. Running it with the respective parameters should give you the desired output.
Alternatively, you can potentially achieve the same by exporting the model into the ONNX format, which also has JS deployment options. Specific details for how to convert a Huggingface model to ONNX can be found here.

Tensorflow Extended: Is it possible to use pytorch training loop in Tensorflow extended flow

I have trained an image classification model using pytorch.
Now, I want to move it from research to production pipeline.
I am thinking of using TensorFlow extended. I have a very noob doubt that will I'll be able to use my PyTorch trained model in the TensorFlow extended pipeline(I can convert the trained model to ONNX and then to Tensorflow compatible format).
I don't want to rewrite and retrain the training part to TensorFlow as it'll be a great overhead.
Is it possible or Is there any better way to productionize the PyTorch trained models?
You should be able to convert your PyTorch image classification model to Tensorflow format using ONNX, as long as you are using standard layers. I would recommend doing the conversion and then look at both model summaries to make sure they are relatively similar. Also, do some tests to make sure your converted model handles any particular edge cases you have. Once you have confirmed that the converted model works, save your model as a TF SavedModel format and then you should be able to use it in Tensorflow Extended (TFX).
For more info on the conversion process, see this tutorial: https://learnopencv.com/pytorch-to-tensorflow-model-conversion/
You could considering using the torchX library. I haven't use it yet, but it seems to make it easier to deploy models by creating and running model pipelines. I don't think it has the same data validation functionality that Tensorflow Extended has, but maybe that will be added in the future.

How to do fine tuning on TFlite model

I would like to fine tune a model on my own data. However the model is distributed by tflite format. Is there anyway to extract the model architecture and parameters out of the tflite file?
One approach could be to convert the TFLite file to another format, and import into a deep learning framework that supports training.
Something like ONNX, using tflite2onnx, and then import into a framework of your choice. Not all frameworks can import from ONNX (e.g. PyTorch). I believe you can train with ONNXRuntime, and MXNet. Unsure if you can train using TensorFlow.
I'm not sure to understand what you need. But if you want to know the exact architecture of your model you can use neutron to find out.
You will get something like the this :
And for your information TensorFlow Lite is not meant to be finetuned. You need to finetune a classic TensorFlow model and then convert it to TensorFlow Lite.

Onnx format, how it works?

I have been studying about ONNX and I understand what it is for and basically how it works.
But would you like to understand how it works? I was unable to find information about that. I know that it is done through protobuf but how is the information stored on ONNX and how is it transformed into another model?
Is all the thick part made by using the frameworks functions?
Have you checked this page? https://github.com/onnx/onnx/blob/master/docs/IR.md
ONNX is a specification that defines how models should be constructed (Intermediate Representation) and the operators in the graph. Converters for various frameworks will convert the trained model into the ONNX representation - see https://github.com/onnx/tutorials#converting-to-onnx-format
ONNX and most all the converters are open source for investigating.
If you have further questions, you can also directly post in ONNX github issues.

TensorRT/TFlite sample implementation

Having a trained '.h5' Keras model file, I'm trying to optimize inference time:
Explored 2 options:
Accelerated inference via TensorRT
'int8' Quantization.
At this point I can convert the model file to TensorFlow protobuf '.pb' format, but as a sidenote, it also contains custom objects of few layers.
Saw a few articles on TensorRT conversion and TFLite conversion, but I don't seem to find a robust implementation that's legible. Can someone explain how that's done (TFLite/Keras Quantization or TensorRT) to use the same model for faster inference.
(Open for other suggestions to improve inference speed supported in TensorFlow and Keras)
This is the user guide on how to use TensorRT in TF: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
This talk explains how TensorRT works in TF: https://developer.nvidia.com/gtc/2019/video/S9431
Note that TensorRT also supports INT8-quantization (during training or post-training).
This blog post also has kind of the same content: https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
This repository has a bunch of examples showing how to use it: https://github.com/tensorflow/tensorrt