Tensorflow Object Detection API quantization with output tensor as TFLite_Detection_PostProcess - tensorflow

I am in the process of performing a full 8-bit quantization of ssd_mobilenet_v2_320x320_coco17 using Object detection API. For this I run the following notebook.
In my case my final hardware requires that the output tensors are TFLite_Detection_PostProcess (they actually can be float32). However, when performing post-training quantization, with the Object detection API, the final tflite output tensors are StatefulPartitonedCall, which are not accepted by the application running on the mcu. So somehow I need to convert to TFLite_Detection_PostProcess.
Some searching reveales that some are using converter = tf.lite.TFLiteConverter.from_frozen_graph(... output_array=output_array...) for convertion where they use
output_arrays = ['TFLite_Detection_PostProcess',
'TFLite_Detection_PostProcess:1',
'TFLite_Detection_PostProcess:2',
'TFLite_Detection_PostProcess:3']
In my case the notebook is using converter = tf.lite.TFLiteConverter.from_saved_model(...) which does not accept an parameter such as output_array.
So, can I quantize using tf.lite.TFLiteConverter.from_saved_model and define my output tensors to TFLite_Detection_PostProcess?
See also TF Object detection API issue tracker TF2 Detection API Models on mobile TFLite_Detection_PostProcess #10130

Related

TF Yamnet Transfer Learning and Quantization

TLDR:
Short term: Trying to quantize a specific portion of a TF model (recreated from a TFLite model). Skip to pictures below. \
Long term: Transfer Learn on Yamnet and compile for Edge TPU.
Source code to follow along is here
I've been trying to transfer learn on Yamnet and compile for a Coral Edge TPU for a few weeks now.
Started here, but quickly realized that model wouldn't quantize and compile for the Edge TPU because of the dynamic input and out of the box TFLite quantization doesn't work well with the preprocessing of audio before Yamnet's MobileNet.
After tinkering and learning for a few weeks, I found a Yamnet model compiled for the Edge TPU (sadly without source code) and figured my best shot would be to try to recreate it in TF, then quantize, then compile to TFLite, then compile for the edge TPU. I'll also have to figure out how to set the weights - not sure if I have to/can do that pre or post quantization. Anyway, I've effectively recreated the model, but am having a hard time quantizing without a bunch of wacky behavior.
The model currently looks like this:
I want it to look like this:
For quantizing, I tried:
TFLite Model Optimization which puts tfl.quantize ops all over the place and fails to compile for the Edge TPU.
Quantization Aware Training which throws some annoying errors that I've been trying to work through.
If you know a better way to achieve the long term goal than what I proposed, please (please please please) share! Otherwise, help on specific quant ops would be great! Also, reach out for clarity
I've ran into your same issues trying to convert the Yamnet model by tensorflow into full integers in order to compile it for Coral edgetpu and I think I've found a workaround for that.
I've been trying to stick to the tutorials posted in the section tflite-model-maker and finding a solution within this API because, for experience, I found it to be a very powerful tool.
If your goal is to build a model which is fully compiled for the edgetpu (meaning all layers, including input and output ones, being converted to int8 type) I'm afraid this solution won't fit for you. But since you posted you're trying to obtain a custom model with the same structure of:
Yamnet model compiled for the Edge TPU
then I think this workaround would help you.
When you train your custom model following the basic tutorial it is possible to export the custom model both in .tflite format
model.export(models_path, tflite_filename='my_birds_model.tflite')
and full tensorflow model:
model.export(models_path, export_format=[mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL])
Then it is possible to convert the full tensorflow saved model to tflite format by using the following script:
import tensorflow as tf
import numpy as np
import glob
from scipy.io import wavfile
dataset_path = '/path/to/DATASET/testing/*/*.wav'
representative_data = []
saved_model_path = './saved_model'
samples = glob.glob(dataset_path)
input_size = 15600 #Yamnet model's input size
def representative_data_gen():
for input_value in samples:
sample_rate, audio_data = wavfile.read(input_value, 'rb')
audio_data = np.array(audio_data)
splitted_audio_data = tf.signal.frame(audio_data, input_size, input_size, pad_end=True, pad_value=0) / tf.int16.max #normalization in [-1,+1] range
yield [np.float32(splitted_audio_data[0])]
tf.compat.v1.enable_eager_execution()
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
converter.experimental_new_converter = True #if you're using tensorflow<=2.2
converter.optimizations = [tf.lite.Optimize.DEFAULT]
#converter.inference_input_type = tf.uint8 # or tf.uint8
#converter.inference_output_type = tf.uint8 # or tf.uint8
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
open(saved_model_path + "converted_model.tflite", "wb").write(tflite_model)
As you can see, the lines which tell the converter to change input/output type are commented. This is because Yamnet model expects in input normalized values of audio sample in the range [-1,+1] and the numerical representation must be float32 type. In fact the compiled model of Yamnet you posted uses the same dtype for input and output layers (float32).
That being said you will end up with a tflite model converted from the full tensorflow model produced by tflite-model-maker. The script will end with the following line:
fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0
and the inference_type: 6 tells you the inference operations are suitable for being compiled to coral edgetpu.
The last step is to compile the model. If you compile the model with the standard edgetpu_compiler command line :
edgetpu_compiler -s converted_model.tflite
the final model would have only 4 operations which run on the EdgeTPU:
Number of operations that will run on Edge TPU: 4
Number of operations that will run on CPU: 53
You have to add the optional flag -a which enables multiple subgraphs (it is in experimental stage though)
edgetpu_compiler -sa converted_model.tflite
After this you will have:
Number of operations that will run on Edge TPU: 44
Number of operations that will run on CPU: 13
And most of the model operations will be mapped to edgetpu, namely:
Operator Count Status
MUL 1 Mapped to Edge TPU
DEQUANTIZE 4 Operation is working on an unsupported data type
SOFTMAX 1 Mapped to Edge TPU
GATHER 2 Operation not supported
COMPLEX_ABS 1 Operation is working on an unsupported data type
FULLY_CONNECTED 3 Mapped to Edge TPU
LOG 1 Operation is working on an unsupported data type
CONV_2D 14 Mapped to Edge TPU
RFFT2D 1 Operation is working on an unsupported data type
LOGISTIC 1 Mapped to Edge TPU
QUANTIZE 3 Operation is otherwise supported, but not mapped due to some unspecified limitation
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU
MEAN 1 Mapped to Edge TPU
STRIDED_SLICE 2 Mapped to Edge TPU
PAD 2 Mapped to Edge TPU
RESHAPE 1 Operation is working on an unsupported data type
RESHAPE 6 Mapped to Edge TPU

Converting DeepLab to TensorFlow Lite

I am trying to convert DeepLab trained on the Cityscapes dataset from here to TFLite. From viewing the frozen graph in Netron, the input and output tensors both are of type uint8. I was able to use the default DeepLab model provided for the TFLite GPU delegate, which had float32 input and output tensors. I didn't think the model was supposed to be quantized, so when trying the following code without the commented lines, I got this error:
F tensorflow/lite/toco/tooling_util.cc:2241] Check failed: array.data_type == array.final_data_type Array "ImageTensor" has mis-matching actual and final data types (data_type=uint8, final_data_type=float).
After this, I found that I should try to quantize the model. I inserted the commented lines to use uint8 instead of float32, but I got this error, which seems like an unsupported op.
F ./tensorflow/lite/toco/toco_tooling.h:38] Check failed: s.ok() Unimplemented: this graph contains anoperator of type Cast for which the quantized form is not yet implemented. Sorry, and patches welcome (that's a relatively fun patch to write, mostly providing the actual quantized arithmetic code for this op).
Is it right to use the quantized script? The off-the-shelf TFLite DeepLab model provided uses float32. Thanks!

Mask R-CNN for TPU on Google Colab

We are trying to build an image segmentation deep learning model using Google Colab TPU. Our model is Mask R-CNN.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
import tensorflow as tf
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model.keras_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
However I am running into issues while converting our Mask R-CNN model to TPU model as pasted below.
ValueError:
Layer <keras.engine.topology.InputLayer object at 0x7f58574f1940> has a
variable shape in a non-batch dimension. TPU models must
have constant shapes for all operations.
You may have to specify `input_length` for RNN/TimeDistributed layers.
Layer: <keras.engine.topology.InputLayer object at 0x7f58574f1940>
Input shape: (None, None, None, 3)
Output shape: (None, None, None, 3)
Appreciate any help.
Google recently released a tutorial on getting Mask R-CNN going on their TPUs. For this, they are using an experimental model for Mask RCNN on Google's TPU github repository (under models/experimental/mask_rcnn). Looking through the code, it looks like they define the model with a fixed input size to overcome the issue you are seeing.
See below for more explanation:
As #aman2930 points out, the shape of your input tensor is not static. This won't work because Tensorflow compiles models with XLA to use a TPU and XLA must have all tensor shapes defined at compile time. In the link above, the website specifically calls this out:
Static shapes
During regular usage TensorFlow attempts to determine the shapes of each tf.Tensor during graph construction. During
execution any unknown shape dimensions are determined dynamically, see
Tensor Shapes for more details.
To run on Cloud TPUs TensorFlow models are compiled using XLA. XLA
uses a similar system for determining shapes at compile time. XLA
requires that all tensor dimensions be statically defined at compile
time. All shapes must evaluate to a constant, and not depend on
external data, or stateful operations like variables or a random
number generator.
That side, further down the document, they mention that the input function is run on the CPU, so isn't limited by static XLA sizes. They point to batch size being the issue, not image size:
Static shapes and batch size
The input pipeline generated by your
input_fn is run on CPU. So it is mostly free from the strict static
shape requirements imposed by the XLA/TPU environment. The one
requirement is that the batches of data fed from your input pipeline
to the TPU have a static shape, as determined by the standard
TensorFlow shape inference algorithm. Intermediate tensors are free to
have a dynamic shapes. If shape inference has failed, but the shape is
known it is possible to impose the correct shape using tf.set_shape().
So you could fix this by reformulating your model to have fixed batch size or to use
tf.contrib.data.batch_and_drop_remainder as they suggest.
Could you please share the input data function. It is hard to tell the exact issue, but it seems that the shape of tensor representing input sample is not static.

How to get access to specific layer using tensorflow estimator and dataset API?

I am using tensorflow 1.3.0 to train a CNN classification model. However I need to get access to the prelogits layer to evaluate my method (i.e. while this is casted as a classification problem, the method is not a classification problem but is used to extract CNN features, i.e. to produce a point in an N-dimensional vector space for an input image test)
I am using both the dataset API (with TFRecord files) and the estimator API to train the model. However, I don't see how I can get access/return the prelogits value using the Estimator API, i.e. estimator.train(), .evaluate() or .predict() since model_fn() needs to return a specific tf.estimator.EstimatorSpec object.
Previously (i.e. using the standard sess=tf.Session() method) I could train the model and get access to the prelogits layer while training (or by loading the model after training) and feed the network with a specific input to get the specific layer output with a sess.run(specific_layer) as long as the layer was named specific_layer.
I have tried to use the prediction output of EstimatorSpec but it did not work. Any ideas/suggestions?

Error with 8-bit Quantization in Tensorflow

I have been experimenting with the new 8-bit quantization feature available in TensorFlow. I could run the example given in the blog post (quantization of googlenet) without any issue and it works fine for me !!!
Now, I would like to apply the same for a simpler network. So I used a pre-trained network for CIFAR-10 (which is trained on Caffe), extracted its parameters, created corresponding graph in tensorflow, initialized the weights with this pre-trained weights and finally saved it as a GraphDef object. See this IPython Notebook for full procedure.
Now I applied the 8-bit quantization with the tensorflow script as mentioned in the Pete Warden's blog:
bazel-bin/tensorflow/contrib/quantization/tools/quantize_graph --input=cifar.pb --output=qcifar.pb --mode=eightbit --bitdepth=8 --output_node_names="ArgMax"
Now I wanted to run the classification on this quantized network. So I loaded the new qcifar.pb to a tensorflow session and passed the image (the same way I passed it to original version). Full code can be found in this IPython Notebook.
But as you can see at the end, I am getting following error:
NotFoundError: Op type not registered 'QuantizeV2'
Can anybody suggest what am I missing here?
Because the quantized ops and kernels are in contrib, you'll need to explicitly load them in your python script. There's an example of that in the quantize_graph.py script itself:
from tensorflow.contrib.quantization import load_quantized_ops_so
from tensorflow.contrib.quantization.kernels import load_quantized_kernels_so
This is something that we should update the documentation to mention!