Onnx format, how it works? - tensorflow

I have been studying about ONNX and I understand what it is for and basically how it works.
But would you like to understand how it works? I was unable to find information about that. I know that it is done through protobuf but how is the information stored on ONNX and how is it transformed into another model?
Is all the thick part made by using the frameworks functions?

Have you checked this page? https://github.com/onnx/onnx/blob/master/docs/IR.md
ONNX is a specification that defines how models should be constructed (Intermediate Representation) and the operators in the graph. Converters for various frameworks will convert the trained model into the ONNX representation - see https://github.com/onnx/tutorials#converting-to-onnx-format
ONNX and most all the converters are open source for investigating.
If you have further questions, you can also directly post in ONNX github issues.

Related

Can't manage to open TensorFlow SavedModel for usage in Keras

I'm kinda new to TensorFlow and Keras, so please excuse any accidental stupidity, but I have an issue. I've been trying to load in models from the TensorFlow Detection Zoo, but haven't had much success.
I can't figure out how to read these saved_model folders (they contain a saved_model.pb file, and an assets and variables folder), so that they're accepted by Keras. Nor can I figure out a way to convert these models so that they may be loaded in. I've tried converting the SavedModel to ONNX, and then convert the ONNX-model to Keras, but that didn't work. Trying to load the original model as a saved_model, and then trying to to save this loaded model in another format gave me no success either.
Since you are new to Tensorflow (and I guess deep learning) I would suggest you stick with the API because the detection zoo models best interface with the object detection API. If you have already downloaded the model, you just need to export it using the exporter_main_v2.py script. This article explains it very well link.

PEGASUS From pytorch to tensorflow

I have fine-tuned PEGASUS model for abstractive summarization using this script which uses huggingface.
The output model is in pytorch.
Is there a way to transorm it into tensorflow model so I can use it in a javascript backend?
There are several ways in which you can potentially achieve a conversion, some of which might not even need Tensorflow at all.
Firstly, the way that does what you intend to do: PEGASUS seems to be completely based on the BartForConditionalGeneration model, according to the transformer implementation notes. This is important, because there exists a script to convert PyTorch checkpoints to TF2 checkpoints. While this script does not explicitly allow you to convert a PEGASUS model, it does have options available for BART. Running it with the respective parameters should give you the desired output.
Alternatively, you can potentially achieve the same by exporting the model into the ONNX format, which also has JS deployment options. Specific details for how to convert a Huggingface model to ONNX can be found here.

How to do fine tuning on TFlite model

I would like to fine tune a model on my own data. However the model is distributed by tflite format. Is there anyway to extract the model architecture and parameters out of the tflite file?
One approach could be to convert the TFLite file to another format, and import into a deep learning framework that supports training.
Something like ONNX, using tflite2onnx, and then import into a framework of your choice. Not all frameworks can import from ONNX (e.g. PyTorch). I believe you can train with ONNXRuntime, and MXNet. Unsure if you can train using TensorFlow.
I'm not sure to understand what you need. But if you want to know the exact architecture of your model you can use neutron to find out.
You will get something like the this :
And for your information TensorFlow Lite is not meant to be finetuned. You need to finetune a classic TensorFlow model and then convert it to TensorFlow Lite.

TensorRT/TFlite sample implementation

Having a trained '.h5' Keras model file, I'm trying to optimize inference time:
Explored 2 options:
Accelerated inference via TensorRT
'int8' Quantization.
At this point I can convert the model file to TensorFlow protobuf '.pb' format, but as a sidenote, it also contains custom objects of few layers.
Saw a few articles on TensorRT conversion and TFLite conversion, but I don't seem to find a robust implementation that's legible. Can someone explain how that's done (TFLite/Keras Quantization or TensorRT) to use the same model for faster inference.
(Open for other suggestions to improve inference speed supported in TensorFlow and Keras)
This is the user guide on how to use TensorRT in TF: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
This talk explains how TensorRT works in TF: https://developer.nvidia.com/gtc/2019/video/S9431
Note that TensorRT also supports INT8-quantization (during training or post-training).
This blog post also has kind of the same content: https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
This repository has a bunch of examples showing how to use it: https://github.com/tensorflow/tensorrt

How can I convert TRT optimized model to saved model?

I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.
Assuming you have a TRT optimized model (i.e., the model is represented already in UFF) you can simply follow the steps outlined here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics. Pay special attention to section 3.3 and 3.4, since in these sections you actually build the TRT engine and then save it to a file for later use. From that point forward, you can just re-use the serialized engine (aka. a PLAN file) to do inference.
Basically, the workflow looks something like this:
Build/train model in TensorFlow.
Freeze model (you get a protobuf representation).
Convert model to UFF so TensorRT can understand it.
Use the UFF representation to build a TensorRT engine.
Serialize the engine and save it to a PLAN file.
Once those steps are done (and you should have sufficient example code in the link I provided) you can just load the PLAN file and re-use it over and over again for inference operations.
If you are still stuck, there is an excellent example that is installed by default here: /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist. You should be able to use that example to see how to get to the UFF format. Then you can just combine that with the example code found in the link I provided.