Can I add Tensorflow Fake Quantization in a Keras sequential model? - tensorflow

I have searched this for a while, but it seems Keras only has quantization feature after the model is trained. I wish to add Tensorflow fake quantization to my Keras sequential model. According to Tensorflow's doc, I need these two functions to do fake quantization: tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph().
My question is has anyone managed to add these two functions in a Keras model? If yes, where should these two function be added? For example, before model.compile or after model.fit or somewhere else? Thanks in advance.

I worked around by post-training quantization. Since my final goal is to train a mdoel for mobile device, instead of fake quantization during training, I exported keras .h5 file and converted to Tenforflow lite .tflite file directly (with post_training_quantize flag set to true). I tested this on a simple cifar-10 model. The original keras model and the quantized tflite model have very close accuracy (the quantized one a bit lower).
Post-training quantization: https://www.tensorflow.org/performance/post_training_quantization
Convert Keras model to tensorflow lite: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/python_api.md
Used the tf-nightly tensorflow here: https://pypi.org/project/tf-nightly/
If you still want to do fake quantization (because for some model, post-training quantization may give poor accuracy according to Google), the original webpage is down last week. But you can find it from github: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
Update: Turns out post-quantization does not really quantize the model. During inference, it still uses float32 kernels to do calculations. Thus, I've switched to quantization-aware training. The accuracy is pretty good for my cifar10 model.

Related

How was the ssd_mobilenet_v1 tflite model in TFHub trained?

How do I find more info on how the ssd_mobilenet_v1 tflite model on TFHub was trained?
Was it trained in such a way that made it easy to convert it to tflite by avoiding certain ops not supported by tflite? Or was it trained normally, and then converted using the tflite converter with TF Select and the tips on this github issue?
Also, does anyone know if there's an equivalent mobilenet tflite model trained on OpenImagesV6? If not, what's the best starting point for training one?
I am not sure about about the exact origin of the model, but looks like it does have TFLite-compatible ops. From my experience, the best place to start for TFLite-compatible SSD models is with the TF2 Detection Zoo. You can convert any of the SSD models using these instructions.
To train your own model, you can follow these instructions that leverage Google Cloud.

Can we use TF-lite to do retrain?

I converted a pretrained model to TF-lite and would like to deploy to the edge device.
If we got new training data and would like to improve the pretrained model, is it possible to do on the edge device?
Ex. Is there any method to train the model and save to TF-lite(FlatBuffer) again on edge device?
Thanks for any inputs!
On-device training is not fully supported yet on TF Lite but you can refer to this blog post to see how it can be done.
https://blog.tensorflow.org/2019/12/example-on-device-model-personalization.html
The basic idea is:
Split your model to a base subgraph (e.g. feature extractor in an image classification model) and a trainable head.
Convert the base subgraph to TF Lite as normal. Convert the trainable head to TF Lite using the experimental tflite-transfer-convert tool.
Retrain the trainable head on-device as you wish.

"Model not quantized" even after post-training quantization

I downloaded a tensorflow model from Custom Vision and want to run it on a coral tpu. I therefore converted it to tensorflow-lite and applying hybrid post-training quantization (as far as I know that's the only way because I do not have access to the training data).
You can see the code here: https://colab.research.google.com/drive/1uc2-Yb9Ths6lEPw6ngRpfdLAgBHMxICk
When I then try to compile it for the edge tpu, I get the following:
Edge TPU Compiler version 2.0.258810407
INFO: Initialized TensorFlow Lite runtime.
Invalid model: model.tflite
Model not quantized
Any idea what my problem might be?
tflite models are not fully quantized using converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]. You might have a look on post training full integer quantization using the representation dataset: https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization_of_weights_and_activations Simply adapt your generator function to yield representative samples (e.g. similar images, to what your image classification network should predict). Very few images are enough for the converter to identify min and max values and quantize your model. However, typically your accuracy is less in comparison to quantization aware learning.
I can't find the source but I believe the edge TPU currently only supports 8bit-quantized models, and no hybrid operators.
EDIT: On Corals FAQ they mention that the model needs to be fully quantized.
You need to convert your model to TensorFlow Lite and it must be
quantized using either quantization-aware training (recommended) or
full integer post-training quantization.

Can I quantize my tensorflow graph for the full version of TF, not tflite?

I need to quantify my model for use in the full version of tensorflow. And I do not find how to do this (in the official manual for quantization of the model, the model is saved in the format tflite)
AFAIK the only supported quantization scheme in tensorflow is tflite. What do you plan to do with a quantized tensorflow graph? If it is inference only, why not simply use tflite?

Tensorflow SSD-Mobilenet model accuracy drop after quantization using transform_graph

I am working on the recently released "SSD-Mobilenet" model by google for object detection.
Model downloaded from following location: https://github.com/tensorflow/models/blob/master/object_detection/g3doc/detection_model_zoo.md
The frozen graph file downloaded from the site is working as expected, however after quantization the accuracy drops significantly (mostly random predictions).
I built tensorflow r1.2 from source, and used following method to quantize:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=frozen_inference_graph.pb --out_graph=optimized_graph.pb --inputs='image_tensor' --outputs='detection_boxes','detection_scores','detection_classes','num_detections' --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,224,224,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms quantize_weights strip_unused_nodes sort_by_execution_order'
I tried various combinations in the "transforms" part, and the transforms mentioned above gave sometimes correct predictions, however no where close to the original model.
Is there any other way to improve performance of the quantized model?
In this case SSD uses mobilenet as it's feature extractor . In-order to increase the speed. If you read the mobilenet paper , it's a lightweight convolutional neural nets specially using separable convolution inroder to reduce parameters .
As I understood separable convolution can loose information because of the channel wise convolution.
So when quantifying a graph according to TF implementation it makes 16 bits ops and weights to 8bits . If you read the tutorial in TF for quantization they clearly have mentioned how this operation is more like adding some noise in to already trained net hoping our model has well generalized .
So this will work really well and almost lossless interms of accuracy for a heavy model like inception , resnet etc. But with the lightness and simplicity of ssd with mobilenet it really can make a accuracy loss .
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
How to Quantize Neural Networks with TensorFlow