"Model not quantized" even after post-training quantization - tensorflow

I downloaded a tensorflow model from Custom Vision and want to run it on a coral tpu. I therefore converted it to tensorflow-lite and applying hybrid post-training quantization (as far as I know that's the only way because I do not have access to the training data).
You can see the code here: https://colab.research.google.com/drive/1uc2-Yb9Ths6lEPw6ngRpfdLAgBHMxICk
When I then try to compile it for the edge tpu, I get the following:
Edge TPU Compiler version 2.0.258810407
INFO: Initialized TensorFlow Lite runtime.
Invalid model: model.tflite
Model not quantized
Any idea what my problem might be?

tflite models are not fully quantized using converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]. You might have a look on post training full integer quantization using the representation dataset: https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization_of_weights_and_activations Simply adapt your generator function to yield representative samples (e.g. similar images, to what your image classification network should predict). Very few images are enough for the converter to identify min and max values and quantize your model. However, typically your accuracy is less in comparison to quantization aware learning.

I can't find the source but I believe the edge TPU currently only supports 8bit-quantized models, and no hybrid operators.
EDIT: On Corals FAQ they mention that the model needs to be fully quantized.
You need to convert your model to TensorFlow Lite and it must be
quantized using either quantization-aware training (recommended) or
full integer post-training quantization.

Related

Accuracy, precision and recall of a tflite model

I trained a model with yolov4. The inference is perfect and so are the metrics. I converted the model to tensorflow lite to be able to use it on android.
I would like to view the accuracy, precision and recall values ​​of the converted model.
How can I do?
There is no direct API that can be used to measure the accuracy, precision, and recall of the tflite model on Android, but you can always create a TfLite Interpreter instance from the TfLite flatbuffer model, run inference on the testing data, and measure the accuracy/precision/recall on your own.
Here is the link to the official TensorFlow Lite Colab with Java/Android sample code: https://www.tensorflow.org/lite/examples/on_device_training/overview#run_inference_using_trained_weights.
The Java code snippet shows how to create an interpreter instance and run inference on the test data. Once the list of predicted labels is comprised, you can compare it with the list of true labels and come up with precision/recall after computing True/False Positives/Negatives.

Tensorflow Extended: Is it possible to use pytorch training loop in Tensorflow extended flow

I have trained an image classification model using pytorch.
Now, I want to move it from research to production pipeline.
I am thinking of using TensorFlow extended. I have a very noob doubt that will I'll be able to use my PyTorch trained model in the TensorFlow extended pipeline(I can convert the trained model to ONNX and then to Tensorflow compatible format).
I don't want to rewrite and retrain the training part to TensorFlow as it'll be a great overhead.
Is it possible or Is there any better way to productionize the PyTorch trained models?
You should be able to convert your PyTorch image classification model to Tensorflow format using ONNX, as long as you are using standard layers. I would recommend doing the conversion and then look at both model summaries to make sure they are relatively similar. Also, do some tests to make sure your converted model handles any particular edge cases you have. Once you have confirmed that the converted model works, save your model as a TF SavedModel format and then you should be able to use it in Tensorflow Extended (TFX).
For more info on the conversion process, see this tutorial: https://learnopencv.com/pytorch-to-tensorflow-model-conversion/
You could considering using the torchX library. I haven't use it yet, but it seems to make it easier to deploy models by creating and running model pipelines. I don't think it has the same data validation functionality that Tensorflow Extended has, but maybe that will be added in the future.

Optimize Tensorflow Object Detection Model V2 Centernet Model for Evaluation

I am using the tensorflow centernet_resnet50_v2_512x512_kpts_coco17_tpu-8 object detection model on a Nvidia Tesla P100 to extract bounding boxes and keypoints for detecting people in a video. Using the pre-trained from tensorflow.org, I am able to process about 16 frames per second. Is there any way I can imporve the evaluation speed for this model? Here are some ideas I have been looking into:
Pruning the model graph since I am only detecting 1 type of object (people)
Have not been successful in doing this. Changing the label_map when building the model does not seem to improve performance.
Hard coding the input size
Have not found a good way to do this.
Compiling the model to an optimized form using something like TensorRT
Initial attempts to convert to TensorRT did not have any performance improvements.
Batching predictions
It looks like the pre-trained model has the batch size hard coded to 1, and so far when I try to change this using the model_builder I see a drop in performance.
My GPU utilization is about ~75% so I don't know if there is much to gain here.
TensorRT should in most cases give a large increase in frames per second compared to Tensorflow.
centernet_resnet50_v2_512x512_kpts_coco17_tpu-8 can be found in the TensorFlow Model Zoo.
Nvidia has released a blog post describing how to optimize models from the TensorFlow Model Zoo using Deepstream and TensorRT:
https://developer.nvidia.com/blog/deploying-models-from-tensorflow-model-zoo-using-deepstream-and-triton-inference-server/
Now regarding your suggestions:
Pruning the model graph: Pruning the model graph can be done by converting your tensorflow model to a TF-TRT model.
Hardcoding the input size: Use the static mode in TF-TRT. This is the default mode and enabled by: is_dynamic_op=False
Compiling the model: My advise would be to convert you model to TF-TRT or first to ONNX and then to TensorRT.
Batching: Specifying the batch size is also covered in the NVIDIA blog post.
Lastly, for my model a big increase in performance came from using FP16 in my inference engine. (mixed precision) You could even try INT8 although then you first have to callibrate.

How to convert model trained on custom data-set for the Edge TPU board?

I have trained my custom data-set using the Tensor Flow Object Detection API. I run my "prediction" script and it works fine on the GPU. Now , I want to convert the model to lite and run it on the Google Coral Edge TPU Board to detect my custom objects. I have gone through the documentation that Google Coral Board Website provides but I found it very confusing.
How to convert and run it on the Google Coral Edge TPU Board?
Thanks
Without reading the documentation, it will be very hard to continue. I'm not sure what your "prediction script" means, but I'm assuming that the script loaded a .pb tensorflow model, loaded some image data, and run inference on it to produce prediction results. That means you have a .pb tensorflow model at the "Frozen graph" stage of the following pipeline:
Image taken from coral.ai.
The next step would be to convert your .pb model to a "fully quantized .tflite model" using the post training quantization technique. The documentation to do that are given here. I also created a github gist, containing an example of Post Training Quantization here. Once you have produced the .tflite model, you'll need to compile the model via the edgetpu_compiler. Although everything you need to know about the edgetpu compiler is in that link, for your purpose, compiling a model is as simple as:
$ edgetpu_compiler your_model_name.tflite
Which will creates a your_model_name_edgetpu.tflite model that is compatible with the EdgeTPU. Now, if at this stage, instead of creating an edgetpu compatible model, you are getting some type of errors, then that means your model did not meets the requirements that are posted in the models-requirements section.
Once you have produced a compiled model, you can then deploy it on an edgetpu device. Currently are 2 main APIs that can be use to run inference with the model:
EdgeTPU API
python api
C++ api
tflite API
C++ api
python api
Ultimately, there are many demo examples to run inference on the model here.
The previous answer works with general classification models, but not with TF object detection API trained models.
You cannot do post-training quantization with TF Lite converter on TF object detection API models.
In order to run object detection models on EdgeTPU-s:
You must train the models in quantized aware training mode with this addition in model config:
graph_rewriter {
quantization {
delay: 48000
weight_bits: 8
activation_bits: 8
}
}
This might not work with all the models provided in the model-zoo, try a quantized model first.
After training, export the frozen graph with: object_detection/export_tflite_ssd_graph.py
Run tensorflow/lite/toco tool on the frozen graph to make it TFLite compatible
And finally run edgetpu_complier on the .tflite file
You can find more in-depth guide here:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md

Can I add Tensorflow Fake Quantization in a Keras sequential model?

I have searched this for a while, but it seems Keras only has quantization feature after the model is trained. I wish to add Tensorflow fake quantization to my Keras sequential model. According to Tensorflow's doc, I need these two functions to do fake quantization: tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph().
My question is has anyone managed to add these two functions in a Keras model? If yes, where should these two function be added? For example, before model.compile or after model.fit or somewhere else? Thanks in advance.
I worked around by post-training quantization. Since my final goal is to train a mdoel for mobile device, instead of fake quantization during training, I exported keras .h5 file and converted to Tenforflow lite .tflite file directly (with post_training_quantize flag set to true). I tested this on a simple cifar-10 model. The original keras model and the quantized tflite model have very close accuracy (the quantized one a bit lower).
Post-training quantization: https://www.tensorflow.org/performance/post_training_quantization
Convert Keras model to tensorflow lite: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/python_api.md
Used the tf-nightly tensorflow here: https://pypi.org/project/tf-nightly/
If you still want to do fake quantization (because for some model, post-training quantization may give poor accuracy according to Google), the original webpage is down last week. But you can find it from github: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
Update: Turns out post-quantization does not really quantize the model. During inference, it still uses float32 kernels to do calculations. Thus, I've switched to quantization-aware training. The accuracy is pretty good for my cifar10 model.