Which object detection pre-trained models are available and convertible with TensorRT? - tensorflow

I'm looking into converting a pre-trained object detection model with TensorRT to try it out on my NVIDIA Jetson TX2 but every model I find has layers that are not yet supported by TensorRT. So far I tried SSD with MobileNet and Faster R-CNN but they both have operations such as Identity that are not supported by TensorRT and I can't find many other TensorFlow models out there.
Thank you

This repo has the pre-trained models you are looking for, along with instructions on how to take a model and build a TensorRT engine: https://github.com/NVIDIA-Jetson/tf_to_trt_image_classification#download

Related

How was the ssd_mobilenet_v1 tflite model in TFHub trained?

How do I find more info on how the ssd_mobilenet_v1 tflite model on TFHub was trained?
Was it trained in such a way that made it easy to convert it to tflite by avoiding certain ops not supported by tflite? Or was it trained normally, and then converted using the tflite converter with TF Select and the tips on this github issue?
Also, does anyone know if there's an equivalent mobilenet tflite model trained on OpenImagesV6? If not, what's the best starting point for training one?
I am not sure about about the exact origin of the model, but looks like it does have TFLite-compatible ops. From my experience, the best place to start for TFLite-compatible SSD models is with the TF2 Detection Zoo. You can convert any of the SSD models using these instructions.
To train your own model, you can follow these instructions that leverage Google Cloud.

What are the main differences between TensorFlowLite, TendorFlow-TRT and TensorRT?

I am using the Coral devboard and the Nvidia Jetson TX2. And that is how I got to know about TensorFlow-Lite, TensorFlow-TRT and TensorRT.
I have some questions about them:
Between TensorFlow-TRT and TensorRT:
When using a fully optimised/compatible graph with TensorRT, which one is faster and why?
The pipeline to use TFlite in a Google Coral (When using TensorFlow 1.x...) is:
a. Use a model available in TensorFlow's zoo
b. Convert the model to frozen (.pb)
c. Use protobuff to serialize the graph
d. Convert to Tflite
e. Apply quantization (INT8)
f. Compile
what would be the pipeline when using TensorFlow-TRT and TensorRT?
Is there somewhere where I can find a good documentation about it?
So far I think TensorRT is closer to TensorFlow Lite because:
TFlite: after compilation you end up with a .quant.edtpu.tflite file which can be used to make inference in the devboard
TensorRT: you will end up with a .plan file to make inference in the devboard.
Thank you for the answers, and if you can point me to documentation which compares them, that will be appreciated.
TensorRT is a very fast CUDA runtime for GPU only. I am using an Nvidia Jetson Xavier NX with Tensorflow models converted to TensorRT, running on the Tensorflow-RT (TRT) runtime. The benefit of TRT runtime is any unsupported operations on TensorRT will fall back to using Tensorflow.
Have not tried Tensorflow-Lite, but I understand it as a reduced TF for inference-only on "small devices". It can support GPU but only limited operations and I think there are no python bindings (currently).

Re-Quantize Already Quantized Models

Is it possible to re-quantize already quantized models?
I have some models that I have trained with Quantization Aware Training (QAT) with Full Integer Quantization. However, I am failing to do GPU Delegation with those models. Is there a way to make the models I already have with Float16 Quantization in order to be able to run them with GPU Delegate.
Are you looking for some ways to convert integer quantized model fo float16 quantized model?
Which version of TFLite are you using? TFLite 2.3 supports running quantized models with GPU delegate. However, as GPU only supports float operations, so it internally dequantize integer weights into float weights.
Please see the doc for how to enable (experimental) quantized model support. https://www.tensorflow.org/lite/performance/gpu_advanced#running_quantized_models_experimental

Is it possible to run yolo model in Jetson without optimizing it?

I had few issues converting yolo.weight model to tensorRT. So, Is it possible to run a yolo model in Jetson with optimizing it to TensorRT ? Will there be the same speed for detection? (Training won't be done in Jetson anyway).
Or is there any other suggestion alternative to TensorRT?
Yes, It is possible to run YOLO model in Jetson without optimizing it with TensorRT. The TensorRT only optimizes the inference time of your model.
You could try converting it to TF Lite format and work from there, but you might need to handle most of the back-end operations yourself.
Also, for both methods, the training is done on the PC and not on the edge device.
You could read more about their documentations here in these links.
Tensorflow RT Github
Tensorflow Lite

TensorRT/TFlite sample implementation

Having a trained '.h5' Keras model file, I'm trying to optimize inference time:
Explored 2 options:
Accelerated inference via TensorRT
'int8' Quantization.
At this point I can convert the model file to TensorFlow protobuf '.pb' format, but as a sidenote, it also contains custom objects of few layers.
Saw a few articles on TensorRT conversion and TFLite conversion, but I don't seem to find a robust implementation that's legible. Can someone explain how that's done (TFLite/Keras Quantization or TensorRT) to use the same model for faster inference.
(Open for other suggestions to improve inference speed supported in TensorFlow and Keras)
This is the user guide on how to use TensorRT in TF: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
This talk explains how TensorRT works in TF: https://developer.nvidia.com/gtc/2019/video/S9431
Note that TensorRT also supports INT8-quantization (during training or post-training).
This blog post also has kind of the same content: https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
This repository has a bunch of examples showing how to use it: https://github.com/tensorflow/tensorrt