Is it possible to run yolo model in Jetson without optimizing it? - tensorflow

I had few issues converting yolo.weight model to tensorRT. So, Is it possible to run a yolo model in Jetson with optimizing it to TensorRT ? Will there be the same speed for detection? (Training won't be done in Jetson anyway).
Or is there any other suggestion alternative to TensorRT?

Yes, It is possible to run YOLO model in Jetson without optimizing it with TensorRT. The TensorRT only optimizes the inference time of your model.
You could try converting it to TF Lite format and work from there, but you might need to handle most of the back-end operations yourself.
Also, for both methods, the training is done on the PC and not on the edge device.
You could read more about their documentations here in these links.
Tensorflow RT Github
Tensorflow Lite

Related

Optimize Tensorflow Object Detection Model V2 Centernet Model for Evaluation

I am using the tensorflow centernet_resnet50_v2_512x512_kpts_coco17_tpu-8 object detection model on a Nvidia Tesla P100 to extract bounding boxes and keypoints for detecting people in a video. Using the pre-trained from tensorflow.org, I am able to process about 16 frames per second. Is there any way I can imporve the evaluation speed for this model? Here are some ideas I have been looking into:
Pruning the model graph since I am only detecting 1 type of object (people)
Have not been successful in doing this. Changing the label_map when building the model does not seem to improve performance.
Hard coding the input size
Have not found a good way to do this.
Compiling the model to an optimized form using something like TensorRT
Initial attempts to convert to TensorRT did not have any performance improvements.
Batching predictions
It looks like the pre-trained model has the batch size hard coded to 1, and so far when I try to change this using the model_builder I see a drop in performance.
My GPU utilization is about ~75% so I don't know if there is much to gain here.
TensorRT should in most cases give a large increase in frames per second compared to Tensorflow.
centernet_resnet50_v2_512x512_kpts_coco17_tpu-8 can be found in the TensorFlow Model Zoo.
Nvidia has released a blog post describing how to optimize models from the TensorFlow Model Zoo using Deepstream and TensorRT:
https://developer.nvidia.com/blog/deploying-models-from-tensorflow-model-zoo-using-deepstream-and-triton-inference-server/
Now regarding your suggestions:
Pruning the model graph: Pruning the model graph can be done by converting your tensorflow model to a TF-TRT model.
Hardcoding the input size: Use the static mode in TF-TRT. This is the default mode and enabled by: is_dynamic_op=False
Compiling the model: My advise would be to convert you model to TF-TRT or first to ONNX and then to TensorRT.
Batching: Specifying the batch size is also covered in the NVIDIA blog post.
Lastly, for my model a big increase in performance came from using FP16 in my inference engine. (mixed precision) You could even try INT8 although then you first have to callibrate.

What are the main differences between TensorFlowLite, TendorFlow-TRT and TensorRT?

I am using the Coral devboard and the Nvidia Jetson TX2. And that is how I got to know about TensorFlow-Lite, TensorFlow-TRT and TensorRT.
I have some questions about them:
Between TensorFlow-TRT and TensorRT:
When using a fully optimised/compatible graph with TensorRT, which one is faster and why?
The pipeline to use TFlite in a Google Coral (When using TensorFlow 1.x...) is:
a. Use a model available in TensorFlow's zoo
b. Convert the model to frozen (.pb)
c. Use protobuff to serialize the graph
d. Convert to Tflite
e. Apply quantization (INT8)
f. Compile
what would be the pipeline when using TensorFlow-TRT and TensorRT?
Is there somewhere where I can find a good documentation about it?
So far I think TensorRT is closer to TensorFlow Lite because:
TFlite: after compilation you end up with a .quant.edtpu.tflite file which can be used to make inference in the devboard
TensorRT: you will end up with a .plan file to make inference in the devboard.
Thank you for the answers, and if you can point me to documentation which compares them, that will be appreciated.
TensorRT is a very fast CUDA runtime for GPU only. I am using an Nvidia Jetson Xavier NX with Tensorflow models converted to TensorRT, running on the Tensorflow-RT (TRT) runtime. The benefit of TRT runtime is any unsupported operations on TensorRT will fall back to using Tensorflow.
Have not tried Tensorflow-Lite, but I understand it as a reduced TF for inference-only on "small devices". It can support GPU but only limited operations and I think there are no python bindings (currently).

How to optimize your tensorflow model by using TensorRT?

These are the instruction to solve the assignments?
Convert your TensorFlow model to UFF
Use TensorRT’s C++ API to parse your model to convert it to a CUDA engine.
TensorRT engine would automatically optimize your model and perform steps
like fusing layers, converting the weights to FP16 (or INT8 if you prefer) and
optimize to run on Tensor Cores, and so on.
Can anyone tell me how to proceed with this assignment because I don't have GPU in my laptop and is it possible to do this in google colab or AWS free account.
And what are the things or packages I have to install for running TensorRT in my laptop or google colab?
so I haven't used .uff but I used .onnx but from what I've seen the process is similar.
According to the documentation, with TensorFlow you can do something like:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=['logits', 'classes'])
frozen_graph = converter.convert()
In TensorFlow1.0, so they have it pretty straight forward, TrtGraphConverter has the option to serialized for FP16 like:
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
precision_mode=”FP16”,
maximum_cached_engines=100)
See the preciosion_mode part, once you have serialized you can load the networks easily on TensorRT, some good examples using cpp are here.
Unfortunately, you'll need a nvidia gpu with FP16 support, check this support matrix.
If I'm correct, Google Colab offered a Tesla K80 GPU which does not have FP16 support. I'm not sure about AWS but I'm certain the free tier does not have gpus.
Your cheapest option could be buying a Jetson Nano which is around ~90$, it's a very powerful board and I'm sure you'll use it in the future. Or you could rent some AWS gpu server, but that is a bit expensive and the setup progress is a pain.
Best of luck!
Export and convert your TensorFlow model into .onnx file.
Then, use this onnx-tensorrt tool to do the CUDA engine file conversion.

How to optimize a trained Tensorflow graph for execution speedup?

in order to do fast CPU inference of a frozen Tensorflow graph (.pb) I am currently using Tensorflow's C API. The inference speed is already fairly good, however (compared to CPU-specific tools like Intel's OpenVINO) I have so far no possibility to somehow optimize the graph before running it. I am interested in any sort of optimization that is suitable:
- device-specific optimization for CPU
- graph-specific optimization (fusing operations, dropping out nodes, ...)
- ... and everything else lowering the time required for inference.
Therefore I am looking for a way to optimize graphs after training and before execution. As mentioned, Tools like Intel's OpenVINO (for CPUs) and NVIDIA's TensorRT (for GPUs) do stuff like that. I am also working with OpenVINO but currently waiting for a bug fix so that I would like to try an additional way.
I thought about trying Tensorflow XLA, but I have no experience using it. Moreover I have to make sure to either get a frozen graph (.pb) or something that I can convert to a frozen graph (e.g. .h5) in the end.
I would be grateful for recommendations!
Greets
follow these steps:
freeze tensorflow trained model (frozen_graph.pb) - for that you may required trained model .pb, checkpoints & output node names
optimize your frozen model with Intel OpenVINO model optimizer -
python3 mo.py --input_model frozen_graph.pb
Additionally you may required input_shape
you will get .xml & .bin files as result. with the help of benchmark_app, you can check inference optimisation .

Which object detection pre-trained models are available and convertible with TensorRT?

I'm looking into converting a pre-trained object detection model with TensorRT to try it out on my NVIDIA Jetson TX2 but every model I find has layers that are not yet supported by TensorRT. So far I tried SSD with MobileNet and Faster R-CNN but they both have operations such as Identity that are not supported by TensorRT and I can't find many other TensorFlow models out there.
Thank you
This repo has the pre-trained models you are looking for, along with instructions on how to take a model and build a TensorRT engine: https://github.com/NVIDIA-Jetson/tf_to_trt_image_classification#download