I'm reading the documentation and source code of the TensorflowJS converter and it makes a clear distinction between Keras SavedModel and Tensorflow SavedModel.
What are the differences between the formats and what is the cross-format support story?
Please refer to:
https://www.tensorflow.org/tutorials/keras/save_and_load#:~:text=Saving%20custom%20objects,-If%20you%20are&text=The%20key%20difference%20between%20HDF5,without%20requiring%20the%20orginal%20code.
It discusses the differences
Related
I am aware of ONNX package but I am not sure how to use it here. The tutorial on their Github page helps convert to Tensorflow but I want to convert it to Keras!
Any help would be appreciated.
For.pb SavedModel : model.save("my_model") default saves to .pb
For .tf SavedModel : model.save("my_model",save_format='.tf')
I would like to know the difference between these two formats. Are they both SavedModel? Are they both the same ? Which is better? Both are TensorFlow extension?
See the documentation of tf.keras.Model.save. save_format can have one of two values:
tf (default in TensorFlow 2.x) means TensorFlow format, a SavedModel protocol buffers file.
h5 (default in TensorFlow 1.x) means the HDF5 Keras format, defined back when Keras was completely independent of TensorFlow and aimed to support multiple backends without being tied to anyone in particular.
In TensorFlow 2.x you should not ever need h5, unless you want to produce a file compatible with older versions or something like that. SavedModel is also more integrated into the TensorFlow ecosystem, for example if you want to use it with TensorFlow Serving.
Having a trained '.h5' Keras model file, I'm trying to optimize inference time:
Explored 2 options:
Accelerated inference via TensorRT
'int8' Quantization.
At this point I can convert the model file to TensorFlow protobuf '.pb' format, but as a sidenote, it also contains custom objects of few layers.
Saw a few articles on TensorRT conversion and TFLite conversion, but I don't seem to find a robust implementation that's legible. Can someone explain how that's done (TFLite/Keras Quantization or TensorRT) to use the same model for faster inference.
(Open for other suggestions to improve inference speed supported in TensorFlow and Keras)
This is the user guide on how to use TensorRT in TF: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
This talk explains how TensorRT works in TF: https://developer.nvidia.com/gtc/2019/video/S9431
Note that TensorRT also supports INT8-quantization (during training or post-training).
This blog post also has kind of the same content: https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
This repository has a bunch of examples showing how to use it: https://github.com/tensorflow/tensorrt
I need to quantify my model for use in the full version of tensorflow. And I do not find how to do this (in the official manual for quantization of the model, the model is saved in the format tflite)
AFAIK the only supported quantization scheme in tensorflow is tflite. What do you plan to do with a quantized tensorflow graph? If it is inference only, why not simply use tflite?
I've been trying to use tensorflow.js, but I need the model in the SavedModel format. So far, I only have the Frozen Graph, as I used Tensorflow for Poets Codelab.
How can I convert the Frozen Graph into SavedModel?
I've been using the latest Python version and Tensorflow 1.8
The SavedModel is really just a wrapper around a frozen graph that provides assets and the serving signature. For a code implementation, see this answer.