How to use tensorRT in Yolov5? - tensorflow2.0

I want to optimize my trained model in Yolov5 using tensorRT. But am unable to find a proper way of doing so.Can any body tell show me how to do so?

You can refer to this repository for Yolo-V5. It has a section dedicated to tensorrt deployment.
You can also learn about tensorrt inference using C++ and Python

https://github.com/ultralytics/yolov5/issues/251
would you try:
python benchmarks.py --weights yolov5s.pt --imgsz 640 --device 0

Benchmark is used for exporting and evaluating ALL export frameworks. Even the ones that has nothing to do with TenosrRT. Use:
python export.py --weights yolov5s.pt --include engine
for exporting your Yolov5 model to TensorRT

Related

How to convert original yolo weights to TensorRT model?

I have developed an improved version of the yolov4-tiny model.
I would like to convert this developed model to a TensorRT model, but after referring to the attached URL, I found that I can only convert the original v4-tiny model.
My question is, how are other people converting their original models to TensorRT?
Thank you in advance.
URL
I understood that you have a custom model that you have trained yourself and you want to convert that to TensorRT.
There are many ways to convert the model to TensorRT. The process depends on which format your model is in but here's one that works for all formats:
Convert your model to ONNX format
Convert the model from ONNX to TensorRT using trtexec
Detailed steps
I assume your model is in Pytorch format. At least the train.py in the repository you linked saves models to that format. You can convert it to ONNX using tf2onnx.
Note that tf2onnx recommends the use of Python 3.7. You can install it here and create a virtual environment using conda or venv if you are using another version of Python.
Then, install tf2onnx:
pip install git+https://github.com/onnx/tensorflow-onnx
Convert your model from saved-model to ONNX
python3 -m tf2onnx.convert --saved-model ./model --output model.onnx
If you are using some other tf format for your model please see the readme of tf2onnx for help.
Then install TensorRT and its dependencies using this guide if you haven't already installed it. Alternatively you can use Nvidia Containers (NGC).
After you have installed TensorRT you can do this command to convert your model using fp16 precision.
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=3000 --buildOnly
You can check all CLI arguments by running
/usr/src/tensorrt/bin/trtexec --help
For YOLO v3-v5 You can use project with manual parsing cfg and weight files, manual building and saving engine file for TensorRT. For example enazoe/yolo-tensorrt. I'm using this come in Multitarget-tracker as fast objects detector on Windows/Linux x86/Nvidia Jetson.
In this case you doesn't need install trtexec and another software from NVidia.

How to Convert Yolov5 model to tensorflow.js

Is it possible to convert the YOLOv5 PyTorch model to the Tensorflow.js model?
I am developing an object detection web app. so I have trained the data using Yolov5, but now I am looking for the correct method to convert that model to Tf.js.
I think this is what you are looking for https://github.com/zldrobit/tfjs-yolov5-example
Inside the YoloV5 repo, run the export.py command.
python export.py --weights yolov5s.pt --include tfjs
Then cd into the above linked repo and copy the weights folder to the public:
cp ./yolov5s_web_model public/web_model
Don't forget, you'll have to change the names array in src/index.js to match your custom model.
But unfortunately, it seems painfully slow at about 1-2 seconds. I don't think I was able to get WebGL working.
With few interim steps, but most of the times it works:
Export PyTorch to ONNX
Convert ONNX to TF Saved Model
Convert TF Saved Model to TFJS Graph Model
When converting from ONNX to TF, you might need to adjust target version if you run into unsupported ops.
Also, make sure to set input resolutions to a fixed values, any dynamic inputs get messed up in this multi-step conversion.

What are the main differences between TensorFlowLite, TendorFlow-TRT and TensorRT?

I am using the Coral devboard and the Nvidia Jetson TX2. And that is how I got to know about TensorFlow-Lite, TensorFlow-TRT and TensorRT.
I have some questions about them:
Between TensorFlow-TRT and TensorRT:
When using a fully optimised/compatible graph with TensorRT, which one is faster and why?
The pipeline to use TFlite in a Google Coral (When using TensorFlow 1.x...) is:
a. Use a model available in TensorFlow's zoo
b. Convert the model to frozen (.pb)
c. Use protobuff to serialize the graph
d. Convert to Tflite
e. Apply quantization (INT8)
f. Compile
what would be the pipeline when using TensorFlow-TRT and TensorRT?
Is there somewhere where I can find a good documentation about it?
So far I think TensorRT is closer to TensorFlow Lite because:
TFlite: after compilation you end up with a .quant.edtpu.tflite file which can be used to make inference in the devboard
TensorRT: you will end up with a .plan file to make inference in the devboard.
Thank you for the answers, and if you can point me to documentation which compares them, that will be appreciated.
TensorRT is a very fast CUDA runtime for GPU only. I am using an Nvidia Jetson Xavier NX with Tensorflow models converted to TensorRT, running on the Tensorflow-RT (TRT) runtime. The benefit of TRT runtime is any unsupported operations on TensorRT will fall back to using Tensorflow.
Have not tried Tensorflow-Lite, but I understand it as a reduced TF for inference-only on "small devices". It can support GPU but only limited operations and I think there are no python bindings (currently).

Workflow for converting .pb files to .tflite

Overview
I know this subject has been discussed many times, but I am having a hard time understanding the workflow, or rather, the variations of the workflow.
For example, imagine you are installing TensorFlow on Windows 10. The main goal being to train a custom model, convert to TensorFlow Lite, and copy the converted .tflite file to a Raspberry Pi running TensorFlow Lite.
The confusion for me starts with the conversion process. After following along with multiple guides, it seems TensorFlow is often install with pip, or Anaconda. But then I see detailed tutorials which indicate it needs to be built from source in order to convert from TensorFlow models to TFLite models.
To make things more interesting, I've also seen models which are converted via Python scripts as seen here.
Question
So far I have seen 3 ways to do this conversion, and it could just be that I don't have a grasp on the full picture. Below are the abbreviated methods I have seen:
Build from source, and use the TensorFlow Lite Optimizing Converter (TOCO):
bazel run --config=opt tensorflow/lite/toco:toco -- --input_file=$OUTPUT_DIR/tflite_graph.pb --output_file=$OUTPUT_DIR/detect.tflite ...
Use the TensorFlow Lite Converter Python API:
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
Use the tflite_convert CLI utilities:
tflite_convert --saved_model_dir=/tmp/mobilenet_saved_model --output_file=/tmp/mobilenet.tflite
I *think I understand that options 2/3 are the same, in the sense that the tflite_convert utility is installed, and can be invoked either from the command line, or through a Python script. But is there a specific reason you should choose one over the other?
And lastly, what really gets me confused is option 1. And maybe it's a version thing (1.x vs 2.x)? But what's the difference between the TensorFlow Lite Optimizing Converter (TOCO) and the TensorFlow Lite Converter. It appears that in order to use TOCO you would have to build TensorFlow from source, so is there is a reason you would use one over the other?
There is no difference in the output from different conversion methods, as long as the parameters remain the same. The Python API is better if you want to generate TFLite models in an automated way (for eg a Python script that's run periodically).
The TensorFlow Lite Optimizing Converter (TOCO) was the first version of the TF->TFLite converter. It was recently deprecated and replaced with a new converter that can handle more ops/models. So I wouldn't recommend using toco:toco via bazel, but rather use tflite_convert as mentioned here.
You should never have to build the converter from source, unless you are making some changes to it and want to test them out.

TensorRT/TFlite sample implementation

Having a trained '.h5' Keras model file, I'm trying to optimize inference time:
Explored 2 options:
Accelerated inference via TensorRT
'int8' Quantization.
At this point I can convert the model file to TensorFlow protobuf '.pb' format, but as a sidenote, it also contains custom objects of few layers.
Saw a few articles on TensorRT conversion and TFLite conversion, but I don't seem to find a robust implementation that's legible. Can someone explain how that's done (TFLite/Keras Quantization or TensorRT) to use the same model for faster inference.
(Open for other suggestions to improve inference speed supported in TensorFlow and Keras)
This is the user guide on how to use TensorRT in TF: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
This talk explains how TensorRT works in TF: https://developer.nvidia.com/gtc/2019/video/S9431
Note that TensorRT also supports INT8-quantization (during training or post-training).
This blog post also has kind of the same content: https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
This repository has a bunch of examples showing how to use it: https://github.com/tensorflow/tensorrt