Following the guide in https://www.tensorflow.org/lite/guide/ops_select#using_bazel_xcode I'm building the TensorFlowLiteSelectTfOps_framework, however this builds with all the Tensorflow Ops whereas I just need 2: FlexAudioSpectrogram and FlexMfcc.
How do I build the TensorFlowLiteSelectTfOps_framework with only the extra ops I need?
Note:
I'm aware that I can selectively build the TFLite framework for my model, but this seems to trim away all unused operations. I would like to keep all the base TFLite Ops and just add FlexAudioSpectrogram and FlexMfcc.
The reason for this is that I'm experimenting with new model architectures for my iOS and Android apps and would like to keep the TFLite framework broadly compatible so I can deploy new models over the air as long as I only use ops from the base TFLite framework.
Related
Dear google mediapipe team
Could you offer the quantized models related pose, face, iris and hand of mediapipe's tflie file
I have used mediapipe's holistic at android with qualcomm device.
I want to improve the performance by using qualcomm's snpe sdk.
the sdk requires quantized models.
if you can offer quantized models of holistic, my plan is that I am going to try to replace tflite releated code to dlc code.
dlc(dinamic layer container) is snpe's format to do inference at qualcomm dsp, and sdk provide converting tool for quantized tflite file.
thanks,
Hoyeon
I have developed body gesture usign mediapipe.
but it's out-throuput couldn't meet our specs.
if I can use snpe sdk, I will achieve my mission.
I checked stackflow and tensorflow pages how I can convert tflite file to quantized tflite file and I found it required saved model
I'm trying to build a TFLite program to run inference with a model which uses TF Select Ops written in C++ without building the entire tflite delegate library, i.e. without adding flex delegate as a dependency in the BUILD file (using bazel here). Keeping the flex delegate in allows the program to build and run on x86_64, but cross-compilation for RaspberryPI fails, and furthermore, the binary is nearly an order of magnitude larger than expected. Is it possible to use ops which are not natively supported by TFLite in a TFlite C++ program without building the entire delegate library?
I think selective build is what you are looking for: https://www.tensorflow.org/lite/guide/reduce_binary_size
It only links the ops that are used in your models so vastly reduce the library size.
Follow the instruction on that page, you can produce .aar files, extracting that file you will find the .so libraries.
Overview
I know this subject has been discussed many times, but I am having a hard time understanding the workflow, or rather, the variations of the workflow.
For example, imagine you are installing TensorFlow on Windows 10. The main goal being to train a custom model, convert to TensorFlow Lite, and copy the converted .tflite file to a Raspberry Pi running TensorFlow Lite.
The confusion for me starts with the conversion process. After following along with multiple guides, it seems TensorFlow is often install with pip, or Anaconda. But then I see detailed tutorials which indicate it needs to be built from source in order to convert from TensorFlow models to TFLite models.
To make things more interesting, I've also seen models which are converted via Python scripts as seen here.
Question
So far I have seen 3 ways to do this conversion, and it could just be that I don't have a grasp on the full picture. Below are the abbreviated methods I have seen:
Build from source, and use the TensorFlow Lite Optimizing Converter (TOCO):
bazel run --config=opt tensorflow/lite/toco:toco -- --input_file=$OUTPUT_DIR/tflite_graph.pb --output_file=$OUTPUT_DIR/detect.tflite ...
Use the TensorFlow Lite Converter Python API:
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()
with tf.io.gfile.GFile('model.tflite', 'wb') as f:
f.write(tflite_model)
Use the tflite_convert CLI utilities:
tflite_convert --saved_model_dir=/tmp/mobilenet_saved_model --output_file=/tmp/mobilenet.tflite
I *think I understand that options 2/3 are the same, in the sense that the tflite_convert utility is installed, and can be invoked either from the command line, or through a Python script. But is there a specific reason you should choose one over the other?
And lastly, what really gets me confused is option 1. And maybe it's a version thing (1.x vs 2.x)? But what's the difference between the TensorFlow Lite Optimizing Converter (TOCO) and the TensorFlow Lite Converter. It appears that in order to use TOCO you would have to build TensorFlow from source, so is there is a reason you would use one over the other?
There is no difference in the output from different conversion methods, as long as the parameters remain the same. The Python API is better if you want to generate TFLite models in an automated way (for eg a Python script that's run periodically).
The TensorFlow Lite Optimizing Converter (TOCO) was the first version of the TF->TFLite converter. It was recently deprecated and replaced with a new converter that can handle more ops/models. So I wouldn't recommend using toco:toco via bazel, but rather use tflite_convert as mentioned here.
You should never have to build the converter from source, unless you are making some changes to it and want to test them out.
I am currently working with the YoloV3-tiny.
Repository: https://github.com/AlexeyAB/darknet
To import the network into C++ project I use OpenVINO-Toolkit. In more detail I use the following procedure to convert the network:
Converting YOLO* Models to the Intermediate Representation (IR)
This procedure carries out a conversion and an optimization to proceed with the inference.
Now, I would like to try the YoloV4 because it seems to be more effective for the purpose of the project. The problem is that OpenVINO Toolkit does not yet support this version and does not report the .json (file needed for optimization) file relative to version 4 but only up to version 3.
What has changed in terms of structure between version 3 and version 4 of the Yolo?
Can I hopefully hope that the conversion of the YoloV3-tiny (or YoloV3) is the same as the YoloV4?
Is the YoloV4 much slower than the YoloV3-tiny using only the CPU for inference?
When will the YoloV4-tiny be available?
Does anyone have information about it?
The difference between YoloV4 and YoloV3 is the backbone. YoloV4 has CSPDarknet53, whilst YoloV3 has Darknet53 backbone.
See https://arxiv.org/pdf/2004.10934.pdf.
Also, YoloV4 is not supported officially by OpenVINO. However, you can still test and validate YoloV4 on your end with some workaround. There is one way for now to run YoloV4 through OpenCV which will build network using nGraph API and then pass to Inference Engine. See https://github.com/opencv/opencv/pull/17185.
The key problem is the Mish activation function - there is no optimized implementation yet, which is why we have to implement it by definition with tanh and exponential functions. Unfortunately, one-to-one topology comparison shows significant performance degradation. The performance results are also available in the github link above.
https://github.com/TNTWEN/OpenVINO-YOLOV4
This is my project based on v3's converter (darknet -> tensorflow ->IR)and i have finished the adaptation of OpenVINO Yolov4,v4-relu,v4-tiny.
You could have a try. And you can use V4's IRmodel and run on v3's c++ demo directly
I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.
Assuming you have a TRT optimized model (i.e., the model is represented already in UFF) you can simply follow the steps outlined here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics. Pay special attention to section 3.3 and 3.4, since in these sections you actually build the TRT engine and then save it to a file for later use. From that point forward, you can just re-use the serialized engine (aka. a PLAN file) to do inference.
Basically, the workflow looks something like this:
Build/train model in TensorFlow.
Freeze model (you get a protobuf representation).
Convert model to UFF so TensorRT can understand it.
Use the UFF representation to build a TensorRT engine.
Serialize the engine and save it to a PLAN file.
Once those steps are done (and you should have sufficient example code in the link I provided) you can just load the PLAN file and re-use it over and over again for inference operations.
If you are still stuck, there is an excellent example that is installed by default here: /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist. You should be able to use that example to see how to get to the UFF format. Then you can just combine that with the example code found in the link I provided.