Re-Quantize Already Quantized Models - tensorflow

Is it possible to re-quantize already quantized models?
I have some models that I have trained with Quantization Aware Training (QAT) with Full Integer Quantization. However, I am failing to do GPU Delegation with those models. Is there a way to make the models I already have with Float16 Quantization in order to be able to run them with GPU Delegate.

Are you looking for some ways to convert integer quantized model fo float16 quantized model?
Which version of TFLite are you using? TFLite 2.3 supports running quantized models with GPU delegate. However, as GPU only supports float operations, so it internally dequantize integer weights into float weights.
Please see the doc for how to enable (experimental) quantized model support. https://www.tensorflow.org/lite/performance/gpu_advanced#running_quantized_models_experimental

Related

Quantized models in Object Detection API for TF2

I want to migrate my code for fine-tuning an object detection model for inference on Coral devices to TensorFlow 2, but I don't see quantized models in the TF2 model zoo.
Is it possible to fine-tune a model in TF2 for this purpose and use a technique like quantization-aware training or post-training quantization? I haven't seen any related tutorials or issues. I've also seen some reports of issues with quantization with TFLite converter in TF2 so I'm not even sure if it's possible to do it in TF2.
Even for the TF1.x models, the conversions are not that straightforward which I am struggling with them at this time. I am pretty disappointed as I think Coral should have better support for tensorflow than Intel NCS USB originally as they are in the same family. However, it seems that I am wrong.
Quantization aware training is possible by adding graph_rewriter at the end of the config file before fine-tuning of the pretrained model:
graph_rewriter {
quantization {
delay: 48000
weight_bits: 8
activation_bits: 8
}
}
Source: https://neuralet.com/article/quantization-of-tensorflow-object-detection-api-models/

tflite quantized inference very slow

I am trying to convert a trained model from checkpoint file to tflite. I am using tf.lite.LiteConverter. The float conversion went fine with reasonable inference speed. But the inference speed of the INT8 conversion is very slow. I tried to debug by feeding in a very small network. I found that inference speed for INT8 model is generally slower than float model.
In the INT8 tflite file, I found some tensors called ReadVariableOp, which doesn't exist in TensorFlow's official mobilenet tflite model.
I wonder what causes the slowness of INT8 inference.
You possibly used x86 cpu instead of one with arm instructions. You can refer it here https://github.com/tensorflow/tensorflow/issues/21698#issuecomment-414764709

"Model not quantized" even after post-training quantization

I downloaded a tensorflow model from Custom Vision and want to run it on a coral tpu. I therefore converted it to tensorflow-lite and applying hybrid post-training quantization (as far as I know that's the only way because I do not have access to the training data).
You can see the code here: https://colab.research.google.com/drive/1uc2-Yb9Ths6lEPw6ngRpfdLAgBHMxICk
When I then try to compile it for the edge tpu, I get the following:
Edge TPU Compiler version 2.0.258810407
INFO: Initialized TensorFlow Lite runtime.
Invalid model: model.tflite
Model not quantized
Any idea what my problem might be?
tflite models are not fully quantized using converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]. You might have a look on post training full integer quantization using the representation dataset: https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization_of_weights_and_activations Simply adapt your generator function to yield representative samples (e.g. similar images, to what your image classification network should predict). Very few images are enough for the converter to identify min and max values and quantize your model. However, typically your accuracy is less in comparison to quantization aware learning.
I can't find the source but I believe the edge TPU currently only supports 8bit-quantized models, and no hybrid operators.
EDIT: On Corals FAQ they mention that the model needs to be fully quantized.
You need to convert your model to TensorFlow Lite and it must be
quantized using either quantization-aware training (recommended) or
full integer post-training quantization.

Can I quantize my tensorflow graph for the full version of TF, not tflite?

I need to quantify my model for use in the full version of tensorflow. And I do not find how to do this (in the official manual for quantization of the model, the model is saved in the format tflite)
AFAIK the only supported quantization scheme in tensorflow is tflite. What do you plan to do with a quantized tensorflow graph? If it is inference only, why not simply use tflite?

Which object detection pre-trained models are available and convertible with TensorRT?

I'm looking into converting a pre-trained object detection model with TensorRT to try it out on my NVIDIA Jetson TX2 but every model I find has layers that are not yet supported by TensorRT. So far I tried SSD with MobileNet and Faster R-CNN but they both have operations such as Identity that are not supported by TensorRT and I can't find many other TensorFlow models out there.
Thank you
This repo has the pre-trained models you are looking for, along with instructions on how to take a model and build a TensorRT engine: https://github.com/NVIDIA-Jetson/tf_to_trt_image_classification#download