Jetson NX optimize tensorflow model using TensorRT - tensorflow

I am trying to speed up the segmentation model(unet-mobilenet-512x512). I converted my tensorflow model to tensorRT with FP16 precision mode. And the speed is lower than I expected.
Before the optimization i had 7FPS on inference with .pb frozen graph. After tensorRT oprimization I have 14FPS.
Here is benchmark results of Jetson NX from their site
You can see, that unet 256x256 segmentation speed is 146 FPS. I thought, the speed of my unet512x512 should be 4 times slower in the worst case.
Here is my code for optimizing tensorflow saved model using TensorRt:
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
params = trt.DEFAULT_TRT_CONVERSION_PARAMS
params = params._replace(
max_workspace_size_bytes=(1<<32))
params = params._replace(precision_mode="FP16")
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir='./model1', conversion_params=params)
converter.convert()
def my_input_fn():
inp1 = np.random.normal(size=(1, 512, 512, 3)).astype(np.float32)
yield [inp1]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
output_saved_model_dir = "trt_graph2"
converter.save(output_saved_model_dir) # Generated engines will be saved.
print("------------------------freezing the graph---------------------")
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
saved_model_loaded = tf.saved_model.load(
output_saved_model_dir, tags=[tf.compat.v1.saved_model.SERVING])
graph_func = saved_model_loaded.signatures[
tf.compat.v1.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(
graph_func)
frozen_func.graph.as_graph_def()
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./",
name="unet_frozen_graphTensorRt.pb",
as_text=False)
I downloaded the repository, that was used for Jetson NX benchmarking ( https://github.com/NVIDIA-AI-IOT/jetson_benchmarks ), and the speed of unet256x256 really is ~146FPS. But there is no pipeline to optimize the model.
How can I get the similar results? I am looking for the solutions to get speed of my model(unet-mobilenet-512x512) close to 30FPSMaybe I should run inference in other way(without tensorflow) or change some converting parameters?
Any suggestions, thanks

As far as I can see, the repository you linked to uses command line tools that use TensorRT (TRT) under the hood. Note that TensorRT is not the same as "TensorRT in TensorFlow" aka TensorFlow-TensorRT (TF-TRT) which is what you are using in your code. Both TF-TRT and TRT models run faster than regular TF models on a Jetson device but TF-TRT models still tend to be slower than TRT ones (source 1, source 2).
The downside of TRT is that the conversion to TRT needs to be done on the target device and that it can be quite difficult to implement it successfully as there are various TensorFlow operations that TRT does not support (in which case you need to write a custom plugin or pray to God that someone on the internet has already done so. …or use TensorRT only for part of your model and do pre-/postprocessing in TensorFlow).
There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:
TF -> UFF -> TRT
TF -> ONNX -> TRT
In both cases, the graphsurgeon/onnx-graphsurgeon libraries can be used to modify the TF/ONNX graph to achieve compatibility of graph operations. Unsupported operations can be added by means of TensorRT plugins, as mentioned above. (This is really the main challenge here: Different graph file formats and different target GPUs support different graph operations.)
There's also a third way where you do TF -> Caffe -> TRT and apparently a fourth one where you use Nvidia's Transfer Learning Toolkit (TLT) (based upon TF/Keras) and a tool called tlt-converter but I'm not familiar with it. The latter link does mention converting a UNet model, though.
Note that the paths involving UFF and Caffe are now deprecated and support will be removed in TensorRT 9.0, so if you want something future-proof, you should probably go for ONNX. That being said, most sample code online I've come across online still uses UFF and TensorRT 9.0 is still some time away.
Anyway, I haven't tried converting a UNet to TensorRT yet, but the following repositories provide sample code which might give you an idea of how it works in principle:
TF -> UFF -> TRT: jkjung-avt/tensorrt_demos, NVIDIA-AI-IOT/tf_to_trt_image_classification (the latter using a bit of C++)
TF -> ONNX -> TRT: tensorflow-onnx, onnx-tensorrt
Keras -> ONNX -> TRT: Nvidia blog post (This one mentions converting a Unet to TRT!)
Note that even if you don't manage to pull off the conversion from ONNX to TRT for your model, using the ONNX runtime for inference could potentially still give you a performance gain, especially when you're using the CUDA or the TensorRT execution provider which will be enabled automatically provided you're on a Jetson device and running the correct ONNXRuntime build. (I'm not sure how it compares to TF-TRT or TRT, though, but it might still be worth a shot.)
Finally, for completeness's sake let me also mention that at least my team has been dabbling with the idea of switching from TF to PyTorch, partly because the Nvidia support has been getting a lot better lately and Nvidia employees seem to gravitate towards PyTorch, too. In particular, there are now two separate ways to convert models to TRT:
PyTorch -> ONNX -> TRT (used by dusty_nv)
PyTorch -> TRT (direct conversion via torch2trt). It seems that quite a few Nvidia repositories use this.

Hi can you share the errors you are getting? Its should work with the following steps:
Convert the TensorFlow/Keras model to a .pb file.
Convert the .pb file to ONNX format.
Create a TensorRT engine.
Run inference from the TensorRT engine.
I am not sure about Unet (I will check) but you may have some operations not supported by onnx (please share your errors).
Here is an example with Resnet-50.
Conversion to .pb:
import tensorflow as tf
import keras
from tensorflow.keras.models import Model
import keras.backend as K
K.set_learning_phase(0)
def keras_to_pb(model, output_filename, output_node_names):
"""
This is the function to convert the Keras model to pb.
Args:
model: The Keras model.
output_filename: The output .pb file name.
output_node_names: The output nodes of the network. If None, then
the function gets the last layer name as the output node.
"""
# Get the names of the input and output nodes.
in_name = model.layers[0].get_output_at(0).name.split(':')[0]
if output_node_names is None:
output_node_names = [model.layers[-1].get_output_at(0).name.split(':')[0]]
sess = keras.backend.get_session()
# The TensorFlow freeze_graph expects a comma-separated string of output node names.
output_node_names_tf = ','.join(output_node_names)
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
sess.close()
wkdir = ''
tf.train.write_graph(frozen_graph_def, wkdir, output_filename, as_text=False)
return in_name, output_node_names
# load the ResNet-50 model pretrained on imagenet
model = keras.applications.resnet.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
# Convert the Keras ResNet-50 model to a .pb file
in_tensor_name, out_tensor_names = keras_to_pb(model, "models/resnet50.pb", None)
Then you need to convert the .pb model to the ONNX format. To do this, you will need to install tf2onnx.
Example:
python -m tf2onnx.convert --input /Path/to/resnet50.pb --inputs input_1:0 --outputs probs/Softmax:0 --output resnet50.onnx
Last step create the TensorRT engine from the ONNX file:
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
def build_engine(onnx_path, shape = [1,224,224,3]):
"""
This is the function to create the TensorRT engine
Args:
onnx_path : Path to onnx_file.
shape : Shape of the input of the ONNX file.
"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = (256 << 20)
with open(onnx_path, 'rb') as model:
parser.parse(model.read())
network.get_input(0).shape = shape
engine = builder.build_cuda_engine(network)
return engine
def save_engine(engine, file_name):
buf = engine.serialize()
with open(file_name, 'wb') as f:
f.write(buf)
def load_engine(trt_runtime, plan_path):
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
I suggest you check this Pytorch TRT Unet implementation

Related

TF Yamnet Transfer Learning and Quantization

TLDR:
Short term: Trying to quantize a specific portion of a TF model (recreated from a TFLite model). Skip to pictures below. \
Long term: Transfer Learn on Yamnet and compile for Edge TPU.
Source code to follow along is here
I've been trying to transfer learn on Yamnet and compile for a Coral Edge TPU for a few weeks now.
Started here, but quickly realized that model wouldn't quantize and compile for the Edge TPU because of the dynamic input and out of the box TFLite quantization doesn't work well with the preprocessing of audio before Yamnet's MobileNet.
After tinkering and learning for a few weeks, I found a Yamnet model compiled for the Edge TPU (sadly without source code) and figured my best shot would be to try to recreate it in TF, then quantize, then compile to TFLite, then compile for the edge TPU. I'll also have to figure out how to set the weights - not sure if I have to/can do that pre or post quantization. Anyway, I've effectively recreated the model, but am having a hard time quantizing without a bunch of wacky behavior.
The model currently looks like this:
I want it to look like this:
For quantizing, I tried:
TFLite Model Optimization which puts tfl.quantize ops all over the place and fails to compile for the Edge TPU.
Quantization Aware Training which throws some annoying errors that I've been trying to work through.
If you know a better way to achieve the long term goal than what I proposed, please (please please please) share! Otherwise, help on specific quant ops would be great! Also, reach out for clarity
I've ran into your same issues trying to convert the Yamnet model by tensorflow into full integers in order to compile it for Coral edgetpu and I think I've found a workaround for that.
I've been trying to stick to the tutorials posted in the section tflite-model-maker and finding a solution within this API because, for experience, I found it to be a very powerful tool.
If your goal is to build a model which is fully compiled for the edgetpu (meaning all layers, including input and output ones, being converted to int8 type) I'm afraid this solution won't fit for you. But since you posted you're trying to obtain a custom model with the same structure of:
Yamnet model compiled for the Edge TPU
then I think this workaround would help you.
When you train your custom model following the basic tutorial it is possible to export the custom model both in .tflite format
model.export(models_path, tflite_filename='my_birds_model.tflite')
and full tensorflow model:
model.export(models_path, export_format=[mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL])
Then it is possible to convert the full tensorflow saved model to tflite format by using the following script:
import tensorflow as tf
import numpy as np
import glob
from scipy.io import wavfile
dataset_path = '/path/to/DATASET/testing/*/*.wav'
representative_data = []
saved_model_path = './saved_model'
samples = glob.glob(dataset_path)
input_size = 15600 #Yamnet model's input size
def representative_data_gen():
for input_value in samples:
sample_rate, audio_data = wavfile.read(input_value, 'rb')
audio_data = np.array(audio_data)
splitted_audio_data = tf.signal.frame(audio_data, input_size, input_size, pad_end=True, pad_value=0) / tf.int16.max #normalization in [-1,+1] range
yield [np.float32(splitted_audio_data[0])]
tf.compat.v1.enable_eager_execution()
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
converter.experimental_new_converter = True #if you're using tensorflow<=2.2
converter.optimizations = [tf.lite.Optimize.DEFAULT]
#converter.inference_input_type = tf.uint8 # or tf.uint8
#converter.inference_output_type = tf.uint8 # or tf.uint8
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
open(saved_model_path + "converted_model.tflite", "wb").write(tflite_model)
As you can see, the lines which tell the converter to change input/output type are commented. This is because Yamnet model expects in input normalized values of audio sample in the range [-1,+1] and the numerical representation must be float32 type. In fact the compiled model of Yamnet you posted uses the same dtype for input and output layers (float32).
That being said you will end up with a tflite model converted from the full tensorflow model produced by tflite-model-maker. The script will end with the following line:
fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0
and the inference_type: 6 tells you the inference operations are suitable for being compiled to coral edgetpu.
The last step is to compile the model. If you compile the model with the standard edgetpu_compiler command line :
edgetpu_compiler -s converted_model.tflite
the final model would have only 4 operations which run on the EdgeTPU:
Number of operations that will run on Edge TPU: 4
Number of operations that will run on CPU: 53
You have to add the optional flag -a which enables multiple subgraphs (it is in experimental stage though)
edgetpu_compiler -sa converted_model.tflite
After this you will have:
Number of operations that will run on Edge TPU: 44
Number of operations that will run on CPU: 13
And most of the model operations will be mapped to edgetpu, namely:
Operator Count Status
MUL 1 Mapped to Edge TPU
DEQUANTIZE 4 Operation is working on an unsupported data type
SOFTMAX 1 Mapped to Edge TPU
GATHER 2 Operation not supported
COMPLEX_ABS 1 Operation is working on an unsupported data type
FULLY_CONNECTED 3 Mapped to Edge TPU
LOG 1 Operation is working on an unsupported data type
CONV_2D 14 Mapped to Edge TPU
RFFT2D 1 Operation is working on an unsupported data type
LOGISTIC 1 Mapped to Edge TPU
QUANTIZE 3 Operation is otherwise supported, but not mapped due to some unspecified limitation
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU
MEAN 1 Mapped to Edge TPU
STRIDED_SLICE 2 Mapped to Edge TPU
PAD 2 Mapped to Edge TPU
RESHAPE 1 Operation is working on an unsupported data type
RESHAPE 6 Mapped to Edge TPU

How to visualize feature maps of a TensorFlow Lite model?

I used Keract to visualize the feature maps of a TensorFlow/Keras model.
I have applied quantization with TensorFlow Lite. I would like to visualize the feature maps generated by the TensorFlow Lite model during an inference. Do you know a way to do this?
The reason is that I don't fully understand the interaction between weights, activations and scale/zero-point coefficients. So I would like to do the inference process step by step for a quantized network.
Thank you for your help
There are several ways to extract information about weights, scales and zero-point values.
way one:
You can also find additional information about the below code from the TensorFlow website.
import tensorflow as tf
import numpy as np
#Load your TFLite model.
TF_LITE_MODEL_FILE_NAME = "Your_TFLite_file.tflite"
interpreter = tf.lite.Interpreter(model_path=TF_LITE_MODEL_FILE_NAME)
#Gives you input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
#Gives you all tensor details index-wise. you will find all the quantization_parameters here
interpreter.get_tensor_details()
#get individual tensor value. interpreter.get_tensor(give_index_number). You will find the index for individual tensor index from get_tensor_details
interpreter.allocate_tensors()
r= interpreter.get_tensor(12).astype(np.float32)
print('Tensors', r)
Way two (the easy way):
Upload your TFLite file on the Netron Website. There you can get lots of information about your TFlite file. You can install Netron on your PC also. Here, is the Netron git link to install Netron on your PC.

keras multi_gpu_model saved_model failed to load model in TF2 code

I have trained a multi_gpu_model using tensorflow 1.13/1.14 and saved them with keras.model.save('<.hdf5>').
Now, after migrating to tensorflow 2.4.1, in which Keras is integrated as tensorflow.keras, I cannot tensorflow.keras.models.load_model as I did before, due to the following error:
AttributeError: module 'tensorflow.python.keras.backend' has no attribute 'slice'
After trying to import keras.models.load_model, and trying different versions of keras (2.2.4 -> 2.4.1) and tensorflow (2.2 -> 2.4.1), I cannot load_model from my .hdf5 file using my TF 2.2+ code.
I do know that in TF 2.X + we can train using distributed machines by implementing the "strategy" scope, and it does work, but I have a lot of "old" models that I need to work on the same code base which is now being migrated to TF 2.4.1
Apparently the problem was not the TF versions, but the way I was saving my models on my TF 1.X code versions.
I used the keras.multi_gpu_model class for both training and saving, while this practice is wrong, as clearly stated on Keras documentation:
"To save the multi-gpu model, use .save(fname) or .save_weights(fname)
with the template model (the argument you passed to multi_gpu_model),
rather than the model returned by multi_gpu_model."
So, after figuring this out a method for model conversion, using TF 1.X code, was adopted:
build you model from scratch, namely new_model
load your pre-trained weights from the multi_gpu_model, namely 'old_model'
copy your old_model's weights, which is old_model.layers[3] (due to the wrong usage of multi_gpu_model) to your new_model
save new_model as .hdf5 file
use new_model.hdf5 everywhere - TF 1.X and TF 2.X

How to convert from Tensorflow.js (.json) model into Tensorflow (SavedModel) or Tensorflow Lite (.tflite) model?

I have downloaded a pre-trained PoseNet model for Tensorflow.js (tfjs) from Google, so its a json file.
However, I want to use it on Android, so I need the .tflite model. Although someone has 'ported' a similar model from tfjs to tflite here, I have no idea what model (there are many variants of PoseNet) they converted. I want to do the steps myself. Also, I don't want to run some arbitrary code someone uploaded into a file in stackOverflow:
Caution: Be careful with untrusted code—TensorFlow models are code. See Using TensorFlow Securely for details. Tensorflow docs
Does anyone know any convenient ways to do this?
You can find out what tfjs format you have by looking in the json file. It often says "graph-model". The difference between them are here.
From tfjs graph model to SavedModel (more common)
Use tfjs-to-tf by Patrick Levin.
import tfjs_graph_converter.api as tfjs
tfjs.graph_model_to_saved_model(
"savedmodel/posenet/mobilenet/float/050/model-stride16.json",
"realsavedmodel"
)
# Code below taken from https://www.tensorflow.org/lite/convert/python_api
converter = tf.lite.TFLiteConverter.from_saved_model("realsavedmodel")
tflite_model = converter.convert()
# Save the TF Lite model.
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
From tfjs layers model to SavedModel
Note: This will only work for layers model format, not graph model format as in the question. I've written the difference between them here.
Install and use tensorflowjs-convert to convert the .json file into a Keras HDF5 file (from another SO thread).
On mac, you'll face issues running pyenv (fix) and on Z-shell, pyenv won't load correctly (fix). Also, once pyenv is running, use python -m pip install tensorflowjs instead of pip install tensorflowjs, because pyenv did not change python used by pip for me.
Once you've followed the tensorflowjs_converter guide, run tensorflowjs_converter to verify it works with no errors, and should just warn you about Missing input_path argument. Then:
tensorflowjs_converter --input_format=tfjs_layers_model --output_format=keras tfjs_model.json hdf5_keras_model.hdf5
Convert the Keras HDF5 file into a SavedModel (standard Tensorflow model file) or directly into .tflite file using the TFLiteConverter. The following runs in a Python file:
# Convert the model.
model = tf.keras.models.load_model('hdf5_keras_model.hdf5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the TF Lite model.
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
or to save to a SavedModel:
# Convert the model.
model = tf.keras.models.load_model('hdf5_keras_model.hdf5')
tf.keras.models.save_model(
model, filepath, overwrite=True, include_optimizer=True, save_format=None,
signatures=None, options=None
)

Load and run test a .trt model

I need to run my model in NVIDIA JETSON T2, So I converted my working yoloV3 model into tensorRT(.trt format)(https://towardsdatascience.com/have-you-optimized-your-deep-learning-model-before-deployment-cdc3aa7f413d)This link mentioned helped me to convert the Yolo model into .trt .But after converting the model to .trt model I needed to test if it works fine (i.e) If the detection is good enough. I couldn't find any sample code for loading and testing .trt model. If anybody can help me , please pull up a sample code in the answer section or any link for reference.
You can load and perform the inference of your TRT Model using this snippet of code.
This is executed in Tensorflow 2.1.0 and Google Colab Environment.
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import tag_constants
saved_model_loaded = tf.saved_model.load(output_saved_model_dir, tags=[tag_constants.SERVING])
signature_keys = list(saved_model_loaded.signatures.keys())
print(signature_keys) # Outputs : ['serving_default']
graph_func = saved_model_loaded.signatures[signature_keys[0]]
graph_func(x_test) # Use this to perform inference
output_saved_model_dir is the location of your TensorRT Optimized model in SavedModel format.
From here, you can add your testing methods to determine the performance of your pre and post-processed model.
EDIT:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1<<32))
conversion_params = conversion_params._replace(precision_mode="FP16")
conversion_params = conversion_params._replace(maximum_cached_engines=100)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir)
Here are the codes used for Converting and Saving the Tensorflow RT Optimized model.