Load and run test a .trt model - tensorflow

I need to run my model in NVIDIA JETSON T2, So I converted my working yoloV3 model into tensorRT(.trt format)(https://towardsdatascience.com/have-you-optimized-your-deep-learning-model-before-deployment-cdc3aa7f413d)This link mentioned helped me to convert the Yolo model into .trt .But after converting the model to .trt model I needed to test if it works fine (i.e) If the detection is good enough. I couldn't find any sample code for loading and testing .trt model. If anybody can help me , please pull up a sample code in the answer section or any link for reference.

You can load and perform the inference of your TRT Model using this snippet of code.
This is executed in Tensorflow 2.1.0 and Google Colab Environment.
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import tag_constants
saved_model_loaded = tf.saved_model.load(output_saved_model_dir, tags=[tag_constants.SERVING])
signature_keys = list(saved_model_loaded.signatures.keys())
print(signature_keys) # Outputs : ['serving_default']
graph_func = saved_model_loaded.signatures[signature_keys[0]]
graph_func(x_test) # Use this to perform inference
output_saved_model_dir is the location of your TensorRT Optimized model in SavedModel format.
From here, you can add your testing methods to determine the performance of your pre and post-processed model.
EDIT:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1<<32))
conversion_params = conversion_params._replace(precision_mode="FP16")
conversion_params = conversion_params._replace(maximum_cached_engines=100)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir)
Here are the codes used for Converting and Saving the Tensorflow RT Optimized model.

Related

Convert a pth pytorch file to an onnx model

I'm trying to convert a PyTorch model(pth file containing weights) to an onnx file then to a TensorFlow model since I work on TensorFlow. to then fine-tune it.
This is my attempt so far. I keep however getting errors.enter image description here
I think the problem is that the weights are for a vision transformer. But I haven't figure out what type of model to use to convert it. I'm assuming a CRNN but if there is an easier way I would love to know.
PS: I did load the pth file to my drive. the path is correct
from torch.autograd import Variable
import torch.onnx
import torchvision
import torch
import onnx
import torch.nn as nn
dummy_input = torch.randn(1, 3, 224, 224)
file_path='/content/drive/MyDrive/VitSTR/vitstr_base_patch16_224_aug.pth'
model = torchvision.models.vgg16()
model.load_state_dict(torch.load(file_path))
model.eval()
torch.onnx.export(model, dummy_input, "vitstr.onnx")
Thank you all.
I used the same architecture as the one in the model and it worked.

TF.js import error with model created using TF Lite Model Maker

I've created a model using the tutorial at https://www.tensorflow.org/lite/tutorials/model_maker_image_classification and exported it in the TF.js format:
import os
import matplotlib.pyplot as plt
import tensorflow as tf
from tflite_model_maker import image_classifier, model_spec
from tflite_model_maker.config import ExportFormat, QuantizationConfig
from tflite_model_maker.image_classifier import DataLoader
image_path = tf.keras.utils.get_file(
'flower_photos.tgz',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
extract=True)
image_path = os.path.join(os.path.dirname(image_path), 'flower_photos')
data = DataLoader.from_folder(image_path)
train_data, test_data = data.split(0.9)
model = image_classifier.create(train_data)
loss, accuracy = model.evaluate(test_data)
# Export model to TF.js format
model.export(export_dir='.', export_format=ExportFormat.TFJS)
When loading this model in TF.js using tf.loadLayersModel I get the following error:
Uncaught (in promise) Error: Unknown layer: HubKerasLayerV1V2.
This may be due to one of the following reasons:
1. The layer is defined in Python, in which case it needs to be
ported to TensorFlow.js or your JavaScript code.
2. The custom layer is defined in JavaScript, but is not registered
properly with tf.serialization.registerClass()
I guess the error is due to reason (1), but how can I port the HubKerasLayerV1V2 layer to TF.js?
I believe this is an issue with the model converter having issues with a partial Graph inside of a Layers model.
You can probably fix this by serializing the model to the normal SaveModel format and export the HDF5. Once you have the .h5 output, use the TensorFlow.js converter (tensorflowjs_converter) to create a purely Graph model. Then try loading with tf.loadGraphModel instead.

Is it impossible to quantization the .tflite file? (OSError Occurred)

I have to try the quantization to my model(tflite).
I want to change float32 to float 16 through the dynamic range quantization.
This is my code:
import tensorflow as tf
import json
import sys
import pprint
from tensorflow import keras
import numpy as np
converter = tf.lite.TFLiteConverter.from_saved_model('models')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()
open("quant.tflite", "wb").write(tflite_quant_model)
In my MacBook, there is a folder called 'models', which contains two tflite files there.
When I execute the code, the following error occurs:
converter = tf.lite.TFLiteConverter.from_saved_model('quantization')
OSError: SavedModel file does not exist at: models/{saved_model.pbtxt|saved_model.pb}
I checked most of the posts in stack overflow, but I couldn't find a solution.
Please review my code and give me some advice.
I uploaded my tflite file because I guess it would be necessary to check if there was a problem.
This is my model(download link):
https://drive.google.com/file/d/13gft7bREsv2vZYFvfoCiP5ndxHkfGKIM/view?usp=sharing
Thank you so much.
The tf.lite.TFLiteConverter.from_saved_model function takes a tensorflow (.pb) model as a parameter. On the other hand, you give a tensorflowlite (.tflite) model, which necessarily leads to an error. If you want to convert your model to float 16, the only way I know of is to take the original model in ".pb" format and you convert it as you want

Jetson NX optimize tensorflow model using TensorRT

I am trying to speed up the segmentation model(unet-mobilenet-512x512). I converted my tensorflow model to tensorRT with FP16 precision mode. And the speed is lower than I expected.
Before the optimization i had 7FPS on inference with .pb frozen graph. After tensorRT oprimization I have 14FPS.
Here is benchmark results of Jetson NX from their site
You can see, that unet 256x256 segmentation speed is 146 FPS. I thought, the speed of my unet512x512 should be 4 times slower in the worst case.
Here is my code for optimizing tensorflow saved model using TensorRt:
import numpy as np
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
params = trt.DEFAULT_TRT_CONVERSION_PARAMS
params = params._replace(
max_workspace_size_bytes=(1<<32))
params = params._replace(precision_mode="FP16")
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir='./model1', conversion_params=params)
converter.convert()
def my_input_fn():
inp1 = np.random.normal(size=(1, 512, 512, 3)).astype(np.float32)
yield [inp1]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
output_saved_model_dir = "trt_graph2"
converter.save(output_saved_model_dir) # Generated engines will be saved.
print("------------------------freezing the graph---------------------")
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2
saved_model_loaded = tf.saved_model.load(
output_saved_model_dir, tags=[tf.compat.v1.saved_model.SERVING])
graph_func = saved_model_loaded.signatures[
tf.compat.v1.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_variables_to_constants_v2(
graph_func)
frozen_func.graph.as_graph_def()
tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
logdir="./",
name="unet_frozen_graphTensorRt.pb",
as_text=False)
I downloaded the repository, that was used for Jetson NX benchmarking ( https://github.com/NVIDIA-AI-IOT/jetson_benchmarks ), and the speed of unet256x256 really is ~146FPS. But there is no pipeline to optimize the model.
How can I get the similar results? I am looking for the solutions to get speed of my model(unet-mobilenet-512x512) close to 30FPSMaybe I should run inference in other way(without tensorflow) or change some converting parameters?
Any suggestions, thanks
As far as I can see, the repository you linked to uses command line tools that use TensorRT (TRT) under the hood. Note that TensorRT is not the same as "TensorRT in TensorFlow" aka TensorFlow-TensorRT (TF-TRT) which is what you are using in your code. Both TF-TRT and TRT models run faster than regular TF models on a Jetson device but TF-TRT models still tend to be slower than TRT ones (source 1, source 2).
The downside of TRT is that the conversion to TRT needs to be done on the target device and that it can be quite difficult to implement it successfully as there are various TensorFlow operations that TRT does not support (in which case you need to write a custom plugin or pray to God that someone on the internet has already done so. …or use TensorRT only for part of your model and do pre-/postprocessing in TensorFlow).
There are basically two ways to convert models from TensorFlow models to TensorRT "engines" aka "plan files", both of which use intermediate formats:
TF -> UFF -> TRT
TF -> ONNX -> TRT
In both cases, the graphsurgeon/onnx-graphsurgeon libraries can be used to modify the TF/ONNX graph to achieve compatibility of graph operations. Unsupported operations can be added by means of TensorRT plugins, as mentioned above. (This is really the main challenge here: Different graph file formats and different target GPUs support different graph operations.)
There's also a third way where you do TF -> Caffe -> TRT and apparently a fourth one where you use Nvidia's Transfer Learning Toolkit (TLT) (based upon TF/Keras) and a tool called tlt-converter but I'm not familiar with it. The latter link does mention converting a UNet model, though.
Note that the paths involving UFF and Caffe are now deprecated and support will be removed in TensorRT 9.0, so if you want something future-proof, you should probably go for ONNX. That being said, most sample code online I've come across online still uses UFF and TensorRT 9.0 is still some time away.
Anyway, I haven't tried converting a UNet to TensorRT yet, but the following repositories provide sample code which might give you an idea of how it works in principle:
TF -> UFF -> TRT: jkjung-avt/tensorrt_demos, NVIDIA-AI-IOT/tf_to_trt_image_classification (the latter using a bit of C++)
TF -> ONNX -> TRT: tensorflow-onnx, onnx-tensorrt
Keras -> ONNX -> TRT: Nvidia blog post (This one mentions converting a Unet to TRT!)
Note that even if you don't manage to pull off the conversion from ONNX to TRT for your model, using the ONNX runtime for inference could potentially still give you a performance gain, especially when you're using the CUDA or the TensorRT execution provider which will be enabled automatically provided you're on a Jetson device and running the correct ONNXRuntime build. (I'm not sure how it compares to TF-TRT or TRT, though, but it might still be worth a shot.)
Finally, for completeness's sake let me also mention that at least my team has been dabbling with the idea of switching from TF to PyTorch, partly because the Nvidia support has been getting a lot better lately and Nvidia employees seem to gravitate towards PyTorch, too. In particular, there are now two separate ways to convert models to TRT:
PyTorch -> ONNX -> TRT (used by dusty_nv)
PyTorch -> TRT (direct conversion via torch2trt). It seems that quite a few Nvidia repositories use this.
Hi can you share the errors you are getting? Its should work with the following steps:
Convert the TensorFlow/Keras model to a .pb file.
Convert the .pb file to ONNX format.
Create a TensorRT engine.
Run inference from the TensorRT engine.
I am not sure about Unet (I will check) but you may have some operations not supported by onnx (please share your errors).
Here is an example with Resnet-50.
Conversion to .pb:
import tensorflow as tf
import keras
from tensorflow.keras.models import Model
import keras.backend as K
K.set_learning_phase(0)
def keras_to_pb(model, output_filename, output_node_names):
"""
This is the function to convert the Keras model to pb.
Args:
model: The Keras model.
output_filename: The output .pb file name.
output_node_names: The output nodes of the network. If None, then
the function gets the last layer name as the output node.
"""
# Get the names of the input and output nodes.
in_name = model.layers[0].get_output_at(0).name.split(':')[0]
if output_node_names is None:
output_node_names = [model.layers[-1].get_output_at(0).name.split(':')[0]]
sess = keras.backend.get_session()
# The TensorFlow freeze_graph expects a comma-separated string of output node names.
output_node_names_tf = ','.join(output_node_names)
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
sess.close()
wkdir = ''
tf.train.write_graph(frozen_graph_def, wkdir, output_filename, as_text=False)
return in_name, output_node_names
# load the ResNet-50 model pretrained on imagenet
model = keras.applications.resnet.ResNet50(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
# Convert the Keras ResNet-50 model to a .pb file
in_tensor_name, out_tensor_names = keras_to_pb(model, "models/resnet50.pb", None)
Then you need to convert the .pb model to the ONNX format. To do this, you will need to install tf2onnx.
Example:
python -m tf2onnx.convert --input /Path/to/resnet50.pb --inputs input_1:0 --outputs probs/Softmax:0 --output resnet50.onnx
Last step create the TensorRT engine from the ONNX file:
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
def build_engine(onnx_path, shape = [1,224,224,3]):
"""
This is the function to create the TensorRT engine
Args:
onnx_path : Path to onnx_file.
shape : Shape of the input of the ONNX file.
"""
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = (256 << 20)
with open(onnx_path, 'rb') as model:
parser.parse(model.read())
network.get_input(0).shape = shape
engine = builder.build_cuda_engine(network)
return engine
def save_engine(engine, file_name):
buf = engine.serialize()
with open(file_name, 'wb') as f:
f.write(buf)
def load_engine(trt_runtime, plan_path):
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
return engine
I suggest you check this Pytorch TRT Unet implementation

Saving a Keras/Sklearn in python and loading the saved model in tensorflow.js

I have a trained sklearn SVM model in .pkl format and a Keras .h5 model. Can I load these models using tensorflow.js on a browser?
I do most of my coding in python and not sure how to work with tensorflow.js
My model saving code looks like this
from sklearn.externals import joblib
joblib.dump(svc,'model.pkl')
model = joblib.load('model.pkl')
prediction = model.predict(X_test)
#------------------------------------------------------------------
from keras.models import load_model
model.save('model.h5')
model = load_model('my_model.h5')
In order to deploy your model with tensorflow-js, you need to use the tensorflowjs_converter, so you also need to install the tensorflowjs dependency.
You can do that in python via pip install tensorflowjs.
Next, you convert your trained model via this operation, according to your custom names: tensorflowjs_converter --input_format=keras /tmp/model.h5 /tmp/tfjs_model, where the last path is the output path of the conversion result.
Note that, after the conversion you will get a model.json (architecture of your model) and a list of N shards (weights split in N shards).
Then, in JavaScript, you need to us the function tf.loadLayersModel(MODEL_URL), where MODEL_URL is the url pointing to your model.json. Ensure that, at the same location with the model.json, the shards are also located.
Since this is an asynchronous operation(you do not want your web-page to get blocked while your model is loading), you need to use the JavaScript await keyword; hence await tf.loadLayersModel(MODEL_URL)
Please have a look at the following link to see an example: https://www.tensorflow.org/js/guide/conversion