Conversion of Tensorflow-lite model to F16 and INT8 - tensorflow

I need to evaluate performance of CNN (Convolutional Neural Network) on an edge device. I started with understanding what is quantization and how run it in colab using interpreter (emulator). Full code is here -> https://github.com/aravindchakravarti/OptimizeNetworks/blob/main/Quantization_Aware_Training.ipynb
I was trying to convert CNN to Float-16 (F16) and Int-8 (T8) quantization levels and wanted to see what is the difference in
Inference time
Model size
I did F16 model conversion by using
converter_fl16 = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter_fl16.optimizations = [tf.lite.Optimize.DEFAULT]
converter_fl16.target_spec.supported_types = [tf.float16]
quantized_tflite_model_f16 = converter_fl16.convert()
And converted to T8
converter_t8 = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter_t8.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model_t8 = converter_t8.convert()
interpreter = tf.lite.Interpreter(model_content=quantized_tflite_model_t8)
interpreter.allocate_tensors()
When I evaluated with respect to Inference Time,
F32 (Without Quantization) = 1.3s
F16 (With Quantization) = 0.6s
T8 (With Quantization) = 0.59s
When I evaluated with respect to Model Size,
F32 (Without Quantization) = 83KB
F16 (With Quantization) = 25KB
T8 (With Quantization) = 25KB
My question is,
Why I am getting same model size and inference time for both F16 and T8? Am I not quantizing properly?

You are trying to convert the int8 model to fp16 and the converter just keeps everything as int8. That's why both of the models are the same.
The issue is in the convert line, should be
converter_fl16 = tf.lite.TFLiteConverter.from_keras_model(model)
After updating you should see
FP32 83k
FP16 44k
I8 25k
You can also inspect the generated model using for example netron or tflite visualize tool to verify.

Related

Tensorflow 2 SSD MobileNet model breaks during conversion to tflite

I've been trying to follow this process to run an object detector (SSD MobileNet) on the Google Coral Edge TPU:
Edge TPU model workflow
I've successfully trained and evaluated my model with the Object Detection API. I have the model both in checkpoint format as well as tf SavedModel format. As per the documentation, the next step is to convert to .tflite format using post-training quantization.
I am to attempting to follow this example. The export_tflite_graph_tf2.py script and the conversion code that comes after run without errors, but I see some weird behavior when I try to actually use the model to run inference.
I am unable to use the saved_model generated by export_tflite_graph_tf2.py. When running the following code, I get an error:
print('loading model...')
model = tf.saved_model.load(tflite_base)
print('model loaded!')
results = model(image_np)
TypeError: '_UserObject' object is not callable --> results = model(image_np)
As a result, I have no way to tell if the script broke my model or not before I even convert it to tflite. Why would model not be callable in this way? I have even verified that the type returned by tf.saved_model.load() is the same when I pass in a saved_model before it went through the export_tflite_graph_tf2.py script and after. The only possible explanation I can think of is that the script alters the object in some way that causes it to break.
I convert to tflite with post-training quantization with the following code:
def representative_data_gen():
dataset_list = tf.data.Dataset.list_files(images_dir + '/*')
for i in range(100):
image = next(iter(dataset_list))
image = tf.io.read_file(image)
# supports PNG as well
image = tf.io.decode_image(image, channels=3)
image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])
image = tf.cast(image / 255., tf.float32)
image = tf.expand_dims(image, 0)
if i == 0:
print(image.dtype)
yield [image]
# This enables quantization
# This sets the representative dataset for quantization
converter = tf.lite.TFLiteConverter.from_saved_model(base_saved_model)
# converter = tf.lite.TFLiteConverter.from_keras(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] # issue here?
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [
# tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
# tf.lite.OpsSet.SELECT_TF_OPS, # enable TensorFlow ops.
tf.lite.OpsSet.TFLITE_BUILTINS_INT8 # This ensures that if any ops can't be quantized, the converter throws an error
]
# This ensures that if any ops can't be quantized, the converter throws an error
# For full integer quantization, though supported types defaults to int8 only, we explicitly declare it for clarity.
converter.target_spec.supported_types = [tf.int8]
converter.target_spec.supported_ops += [tf.lite.OpsSet.TFLITE_BUILTINS]
# These set the input and output tensors to uint8 (added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quantized = converter.convert()
Everything runs with no errors, but when I try to actually run an image through the model, it returns garbage. I tried removing the quantization to see if that was the issue, but even without quantization it returns seemingly random bounding boxes that are completely off from the model's performance prior to conversion. The shape of the output tensors look fine, it's just the content is all wrong.
What's the right way to get this model converted to a quantized tflite form? I should note that I can't use the tflite_convert utility because I need to quantize the model, and it appears according to the source code that the quantize_weights flag is deprecated? There are a bunch of conflicting resources I see from TF1 and TF2 about this conversion process so I'm pretty confused.
Note: I'm using a retrained SSD MobileNet from the model zoo. I have not made any changes to the architecture in my training workflow. I've confirmed that the errors persist even on the base model pulled directly from the object detection model zoo.
I’m have a very similar problem with Post Training Quantization and asked about it on GitHub
I could manage to get results from the TFLite model but they were not good enough. Here is the notebook how I did it. Maybe it helps you to get a step forward.

About the input of quantum neural network in tensorflow quantum

I created a quantum neural network using tensorflow quantum,It's input is a tensor converted by circuit.About this input circuit,I found that if the parameters of the circuit are also specified by tensors, the quantum neural network cannot be trained.
The circuit when using normal parameters can make the network train normally
theta_g=1
blob_size = abs(1 - 4) / 5
spread_x = np.random.uniform(-blob_size, blob_size)
spread_y = np.random.uniform(-blob_size, blob_size)
angle = theta_g + spread_y
cir=cirq.Circuit(cirq.ry(-angle)(qubit), cirq.rx(-spread_x)(qubit))
discriminator_network(tfq.convert_to_tensor([cir]))
But when I use the following code, the quantum neural network cannot be trained
theta_g=tf.constant([1])
blob_size = abs(1 - 4) / 5
spread_x = np.random.uniform(-blob_size, blob_size)
spread_y = np.random.uniform(-blob_size, blob_size)
spred_x = tf.constant(spread_x)
spred_y = tf.constant(spread_y)
angle = theta_g + spread_y
cir=cirq.Circuit(cirq.ry(-angle)(qubit), cirq.rx(-spread_x)(qubit))
discriminator_network(tfq.convert_to_tensor([cir]))
** the disciminator_network**
def discriminator():
theta = sympy.Symbol('theta')
q_model = cirq.Circuit(cirq.ry(theta)(qubit))
q_data_input = tf.keras.Input(
shape=(), dtype=tf.dtypes.string)
expectation = tfq.layers.PQC(q_model, cirq.Z(qubit))
expectation_output = expectation(q_data_input)
classifier = tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid)
classifier_output = classifier(expectation_output)
model = tf.keras.Model(inputs=q_data_input, outputs=classifier_output)
return model
Without being able to see the trace of the error you are getting, I would say that I think the problem you are running into in the second snippet is that you have placed tf.constant objects into the placeholders of the cirq.Circuit. The reason your first example works is because cirq.Circuits know how to interpret values from np.float32 datatypes. Cirq does not know how to interpret values from tf.float32 (or any tf.dtypes.* for that matter).
TensorFlow Quantum's entry point to interface tensorflow datatypes with cirq.Circuit objects is via resolving the sympy.Symbol values inside of the circuits in tfq native operations (which you have done in creating the tfq.layers.PQC).
Does this help clear things up ?
-Michael

TensorFlow Lite Full-Integer Quantization fails in TF 2

I've trained resnet50v2 and densenet 169 models. TensorFlow nightly 2.3.0-dev20200608. The model works fine and I tried some optimization such as "simple" tf lite, tf lite dynamic range, tf lite 16float, and they all work fine (the accuracy is either identical to the original or slightly lower as expected).
I want to convert my model to use full-integer post-training quantization with uint8. I converted my model from SavedModel format with:
converter = tf.lite.TFLiteConverter.from_saved_model('/path/to/my/saved_models')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for i in range(100):
yield [x_train[i].astype(np.float32)]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
with open('resnet.tflite', 'wb') as f:
f.write(tflite_model)
as written in the TensorFlow lite website. I then compiled the model for edge tpu. It works, in the sense that edge tpu allows me to run it without errors but results are gibberish. It predicts always the same value. I tried then on cpu with tf lite interpreter. Input/output tensor are correctly uint8, but again it predicts again the same value. On cpu wiht tf lite issue persists moving to int8. Has anyone else experienced the same issue?
Please find here a Google folder with the code I use to convert, the model before and after conversion, and the converted model.
https://drive.google.com/drive/folders/11XruNeJzdIm9DTn7FnuIWYaSalqg2F0B?usp=sharing

Unable to properly convert tf.keras model to quantized format for coral TPU

I'am trying to convert a tf.keras model based on mobilenetv2 with transpose convolution using latest tf-nighlty. Here is the conversion code
#saved_model_dir='/content/ksaved' # tried from saved model also
#converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter = tf.lite.TFLiteConverter.from_keras_model(reshape_model)
converter.experimental_new_converter=True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converter.representative_dataset = representative_dataset_gen
tflite_quant_modell = converter.convert()
open("/content/model_quant.tflite", "wb").write(tflite_quant_modell)
The conversion was successful(in google colab); but it has quantize and dequantize operators at the ends(as seen using netron). All operators seems to be supported. Representative data set images are float32 in generator and the model has a 4 channel float32 input by default. It looks like we need a UINT8 input and output inside model for coral TPU. How can we properly carry out this conversion?
Ref:-
Full integer quantization of weights and activations
How to quantize inputs and outputs of optimized tflite model
Coral Edge TPU Compiler cannot convert tflite model: Model not quantized
I tried with 'tf.compat.v1.lite.TFLiteConverter.from_keras_model_file' instead of v2 version.I got error: "Quantization not yet supported for op: TRANSPOSE_CONV" while trying to quantize the model in latest tf 1.15 (using representative dataset) and "Internal compiler error. Aborting! " from coral tpu compiler using tf2.0 quantized tflite
Tflite model # https://github.com/tensorflow/tensorflow/issues/31368
It seems to work until the last constitutional block (1x7x7x160)
The compiler error(Aborting) does not give any information regarding the potential cause and all types of convolutional layers seems to be supported as per coral docs.
Coral doc: https://coral.ai/docs/edgetpu/models-intro/#quantization
Here is a dummy model example of quantizing a keras model. Notice I'm using strict tf1.15 for the example, because tf2.0 deprecated:
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
with the from_keras_model api. I think the most confusing thing about this is that you can still call it but nothing happens. This means that model will still take in float inputs. I notice that you are using tf2.0, because from_keras_model is a tf2.0 api. Coral still suggest using tf1.15 for converting model for now. I suggest downgrading tensorflow or maybe even just use this (while keeping tf2.0, it may or may not work):
tf.compat.v1.lite.TFLiteConverter.from_keras_model_file
More on it here.
I always make sure not to use the experimental converter:
converter.experimental_new_converter = False

Tensorflow lite model is slower than its regular model during inference. Why?

I have a regular model and I use tf.lite.TFLiteConverter.from_keras_model_file to convert it to .tflite model. Ane then I use interpreter to do the inference of images.
tf.logging.set_verbosity(tf.logging.DEBUG)
interpreter = tf.lite.Interpreter(model_path)
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[0]["index"]
output_index= interpreter.get_output_details()[0]["index"]
for loop:
(read image)
interpreter.set_tensor(input_index, image)
interpreter.invoke()
result = interpreter.get_tensor(output_index)
With the regular model, I use following to do the prediction.
model = keras.models.load_model({h5 model path}, custom_objects={'loss':loss})
for loop:
(read image)
result = model.predict(image)
However, the elapsed time on inference .tflite model is much longer than the regular. I also try the post-training quantization on the .tflite, but this model is the slowest one compared with the other two. Does it make sense? Why this happens? Is it any way to make the tensorflow lite model faster than its regular one? Thanks.