OpenVINO - Toolkit with YoloV4 - yolo

I am currently working with the YoloV3-tiny.
Repository: https://github.com/AlexeyAB/darknet
To import the network into C++ project I use OpenVINO-Toolkit. In more detail I use the following procedure to convert the network:
Converting YOLO* Models to the Intermediate Representation (IR)
This procedure carries out a conversion and an optimization to proceed with the inference.
Now, I would like to try the YoloV4 because it seems to be more effective for the purpose of the project. The problem is that OpenVINO Toolkit does not yet support this version and does not report the .json (file needed for optimization) file relative to version 4 but only up to version 3.
What has changed in terms of structure between version 3 and version 4 of the Yolo?
Can I hopefully hope that the conversion of the YoloV3-tiny (or YoloV3) is the same as the YoloV4?
Is the YoloV4 much slower than the YoloV3-tiny using only the CPU for inference?
When will the YoloV4-tiny be available?
Does anyone have information about it?

The difference between YoloV4 and YoloV3 is the backbone. YoloV4 has CSPDarknet53, whilst YoloV3 has Darknet53 backbone.
See https://arxiv.org/pdf/2004.10934.pdf.
Also, YoloV4 is not supported officially by OpenVINO. However, you can still test and validate YoloV4 on your end with some workaround. There is one way for now to run YoloV4 through OpenCV which will build network using nGraph API and then pass to Inference Engine. See https://github.com/opencv/opencv/pull/17185.
The key problem is the Mish activation function - there is no optimized implementation yet, which is why we have to implement it by definition with tanh and exponential functions. Unfortunately, one-to-one topology comparison shows significant performance degradation. The performance results are also available in the github link above.

https://github.com/TNTWEN/OpenVINO-YOLOV4
This is my project based on v3's converter (darknet -> tensorflow ->IR)and i have finished the adaptation of OpenVINO Yolov4,v4-relu,v4-tiny.
You could have a try. And you can use V4's IRmodel and run on v3's c++ demo directly

Related

Freeze Saved_Model.pb created from converted Keras H5 model

I am currently trying to train a custom model for use in Unity (Barracuda) for object detection and I am struggling near what I believe to be the last part of the pipeline. Following various tutorials and git-repos I have done the following...
Using Darknet, I have trained a custom-model using the Tiny-Yolov2 model. (model tested successfully on a webcam python script)
I have taken the final weights from that training and converted them
to a Keras (h5) file. (model tested successfully on a webcam python
script)
From Keras, I then use tf.save_model to turn it into a
save_model.pd.
From save_model.pd I then convert it using tf2onnx.convert to change
it to an onnx file.
Supposedly from there it can then work in one of a few Unity sample
projects...
...however, this project fails to read in the Unity Sample projects I've tried to use. From various posts it seems that I may need to use a 'frozen' save_model.pd before converting it to ONNX. However all the guides and python functions that seem to be used for freezing save_models require a lot more arguments than I have awareness of or data for after going through so many systems. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py - for example, after converting into Keras, I only get left with a h5 file, with no knowledge of what an input_graph_def, or output_node_names might refer to.
Additionally, for whatever reason, I cannot find any TF version (1 or 2) that can successfully run this python script using 'from tensorflow.python.checkpoint import checkpoint_management' it genuinely seems like it not longer exists.
I am not sure why I am going through all of these conversions and steps but every attempt to find a cleaner process between training and unity seemed to lead only to dead ends.
Any help or guidance on this topic would be sincerely appreciated, thank you.

Workflow for converting .pb files to .tflite

Overview
I know this subject has been discussed many times, but I am having a hard time understanding the workflow, or rather, the variations of the workflow.
For example, imagine you are installing TensorFlow on Windows 10. The main goal being to train a custom model, convert to TensorFlow Lite, and copy the converted .tflite file to a Raspberry Pi running TensorFlow Lite.
The confusion for me starts with the conversion process. After following along with multiple guides, it seems TensorFlow is often install with pip, or Anaconda. But then I see detailed tutorials which indicate it needs to be built from source in order to convert from TensorFlow models to TFLite models.
To make things more interesting, I've also seen models which are converted via Python scripts as seen here.
Question
So far I have seen 3 ways to do this conversion, and it could just be that I don't have a grasp on the full picture. Below are the abbreviated methods I have seen:
Build from source, and use the TensorFlow Lite Optimizing Converter (TOCO):
bazel run --config=opt tensorflow/lite/toco:toco -- --input_file=$OUTPUT_DIR/tflite_graph.pb --output_file=$OUTPUT_DIR/detect.tflite ...
Use the TensorFlow Lite Converter Python API:
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
Use the tflite_convert CLI utilities:
tflite_convert --saved_model_dir=/tmp/mobilenet_saved_model --output_file=/tmp/mobilenet.tflite
I *think I understand that options 2/3 are the same, in the sense that the tflite_convert utility is installed, and can be invoked either from the command line, or through a Python script. But is there a specific reason you should choose one over the other?
And lastly, what really gets me confused is option 1. And maybe it's a version thing (1.x vs 2.x)? But what's the difference between the TensorFlow Lite Optimizing Converter (TOCO) and the TensorFlow Lite Converter. It appears that in order to use TOCO you would have to build TensorFlow from source, so is there is a reason you would use one over the other?
There is no difference in the output from different conversion methods, as long as the parameters remain the same. The Python API is better if you want to generate TFLite models in an automated way (for eg a Python script that's run periodically).
The TensorFlow Lite Optimizing Converter (TOCO) was the first version of the TF->TFLite converter. It was recently deprecated and replaced with a new converter that can handle more ops/models. So I wouldn't recommend using toco:toco via bazel, but rather use tflite_convert as mentioned here.
You should never have to build the converter from source, unless you are making some changes to it and want to test them out.

TensorRT/TFlite sample implementation

Having a trained '.h5' Keras model file, I'm trying to optimize inference time:
Explored 2 options:
Accelerated inference via TensorRT
'int8' Quantization.
At this point I can convert the model file to TensorFlow protobuf '.pb' format, but as a sidenote, it also contains custom objects of few layers.
Saw a few articles on TensorRT conversion and TFLite conversion, but I don't seem to find a robust implementation that's legible. Can someone explain how that's done (TFLite/Keras Quantization or TensorRT) to use the same model for faster inference.
(Open for other suggestions to improve inference speed supported in TensorFlow and Keras)
This is the user guide on how to use TensorRT in TF: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html
This talk explains how TensorRT works in TF: https://developer.nvidia.com/gtc/2019/video/S9431
Note that TensorRT also supports INT8-quantization (during training or post-training).
This blog post also has kind of the same content: https://medium.com/tensorflow/high-performance-inference-with-tensorrt-integration-c4d78795fbfe
This repository has a bunch of examples showing how to use it: https://github.com/tensorflow/tensorrt

How can I convert TRT optimized model to saved model?

I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.
Assuming you have a TRT optimized model (i.e., the model is represented already in UFF) you can simply follow the steps outlined here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics. Pay special attention to section 3.3 and 3.4, since in these sections you actually build the TRT engine and then save it to a file for later use. From that point forward, you can just re-use the serialized engine (aka. a PLAN file) to do inference.
Basically, the workflow looks something like this:
Build/train model in TensorFlow.
Freeze model (you get a protobuf representation).
Convert model to UFF so TensorRT can understand it.
Use the UFF representation to build a TensorRT engine.
Serialize the engine and save it to a PLAN file.
Once those steps are done (and you should have sufficient example code in the link I provided) you can just load the PLAN file and re-use it over and over again for inference operations.
If you are still stuck, there is an excellent example that is installed by default here: /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist. You should be able to use that example to see how to get to the UFF format. Then you can just combine that with the example code found in the link I provided.

How to use Tensorflow model comparison with tflite_diff_example_test

I have trained a model for detection, which is doing great when embedded in tensorflow sample app.
After freezing with export_tflite_ssd_graph and conversion to tflite using toco the results do perform rather bad and have a huge "variety".
Reading this answer on a similar problem with loss of accuracy I wanted to try tflite_diff_example_test on a tensorflow docker machine.
As the documentation is not that evolved right now, I build the tool referencing this SO Post
using:
bazel build tensorflow/contrib/lite/testing/tflite_diff_example_test.cc which ran smooth.
After figuring out all my needed input parameters I tried the testscript with following commands:
~/.cache/bazel/_bazel_root/68a62076e91007a7908bc42a32e4cff9/external/bazel_tools/tools/test/test-setup.sh tensorflow/contrib/lite/testing/tflite_diff_example_test '--tensorflow_model=/tensorflow/shared/exported/tflite_graph.pb' '--tflite_model=/tensorflow/shared/exported/detect.tflite' '--input_layer=a,b,c,d' '--input_layer_type=float,float,float,float' '--input_layer_shape=1,3,4,3:1,3,4,3:1,3,4,3:1,3,4,3' '--output_layer=x,y'
and
bazel-bin/tensorflow/contrib/lite/testing/tflite_diff_example_test --tensorflow_model="/tensorflow/shared/exported/tflite_graph.pb" --tflite_model="/tensorflow/shared/exported/detect.tflite" --input_layer=a,b,c,d --input_layer_type=float,float,float,float --input_layer_shape=1,3,4,3:1,3,4,3:1,3,4,3:1,3,4,3 --output_layer=x,y
Both ways are failing. Errors:
way:
tflite_diff_example_test.cc:line 1: /bazel: Is a directory
tflite_diff_example_test.cc: line 3: syntax error near unexpected token '('
tflite_diff_example_test.cc: line 3: 'Licensed under the Apache License, Version 2.0 (the "License");'
/root/.cache/bazel/_bazel_root/68a62076e91007a7908bc42a32e4cff9/external/bazel_tools/tools/test/test-setup.sh: line 184: /tensorflow/: Is a directory
/root/.cache/bazel/_bazel_root/68a62076e91007a7908bc42a32e4cff9/external/bazel_tools/tools/test/test-setup.sh: line 276: /tensorflow/: Is a directory
way:
2018-09-10 09:34:27.650473: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Failed to create session. Op type not registered 'TFLite_Detection_PostProcess' in binary running on d36de5b65187. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.)tf.contrib.resamplershould be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
I would really appreciate any help, that enables me to compare the output of two graphs using tensorflows given tests.
The second way you mentioned is the correct way to use tflite_diff. However, the object detection model containing the TFLite_Detection_PostProcess op cannot be run via tflite_diff.
tflite_diff runs the provided TensorFlow (.pb) model in the TensorFlow runtime and runs the provided TensorFlow Lite (.tflite) model in the TensorFlow Lite runtime. In order to run the .pb model in the TensorFlow runtime, all of the operations must be implemented in TensorFlow.
However, in the model you provided, the TFLite_Detection_PostProcess op is not implemented in TensorFlow runtime - it is only available in the TensorFlow Lite runtime. Therefore, TensorFlow cannot resolve the op. Therefore, you unfortunately cannot use the tflite_diff tool with this model.