Inference of Audio data on HuggingFace Wav2Vec TFlite Model - tensorflow

I've Been trying to run inference on an audio sample with a tflite version of Wav2Vec2.0 model that got from HuggingFace. When I try and run inference on the model, this is the error I get
RuntimeError: Index out of range using input dim 1; input has only 1 dims
(while executing 'StridedSlice' via Eager)Node number 1491 (TfLiteFlexDelegate) failed to invoke.
A Colab Notebook with the TFLite conversion code and the used inference code can be found here
Thank You!


TF Yamnet Transfer Learning and Quantization

Short term: Trying to quantize a specific portion of a TF model (recreated from a TFLite model). Skip to pictures below. \
Long term: Transfer Learn on Yamnet and compile for Edge TPU.
Source code to follow along is here
I've been trying to transfer learn on Yamnet and compile for a Coral Edge TPU for a few weeks now.
Started here, but quickly realized that model wouldn't quantize and compile for the Edge TPU because of the dynamic input and out of the box TFLite quantization doesn't work well with the preprocessing of audio before Yamnet's MobileNet.
After tinkering and learning for a few weeks, I found a Yamnet model compiled for the Edge TPU (sadly without source code) and figured my best shot would be to try to recreate it in TF, then quantize, then compile to TFLite, then compile for the edge TPU. I'll also have to figure out how to set the weights - not sure if I have to/can do that pre or post quantization. Anyway, I've effectively recreated the model, but am having a hard time quantizing without a bunch of wacky behavior.
The model currently looks like this:
I want it to look like this:
For quantizing, I tried:
TFLite Model Optimization which puts tfl.quantize ops all over the place and fails to compile for the Edge TPU.
Quantization Aware Training which throws some annoying errors that I've been trying to work through.
If you know a better way to achieve the long term goal than what I proposed, please (please please please) share! Otherwise, help on specific quant ops would be great! Also, reach out for clarity
I've ran into your same issues trying to convert the Yamnet model by tensorflow into full integers in order to compile it for Coral edgetpu and I think I've found a workaround for that.
I've been trying to stick to the tutorials posted in the section tflite-model-maker and finding a solution within this API because, for experience, I found it to be a very powerful tool.
If your goal is to build a model which is fully compiled for the edgetpu (meaning all layers, including input and output ones, being converted to int8 type) I'm afraid this solution won't fit for you. But since you posted you're trying to obtain a custom model with the same structure of:
Yamnet model compiled for the Edge TPU
then I think this workaround would help you.
When you train your custom model following the basic tutorial it is possible to export the custom model both in .tflite format
model.export(models_path, tflite_filename='my_birds_model.tflite')
and full tensorflow model:
model.export(models_path, export_format=[mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL])
Then it is possible to convert the full tensorflow saved model to tflite format by using the following script:
import tensorflow as tf
import numpy as np
import glob
from import wavfile
dataset_path = '/path/to/DATASET/testing/*/*.wav'
representative_data = []
saved_model_path = './saved_model'
samples = glob.glob(dataset_path)
input_size = 15600 #Yamnet model's input size
def representative_data_gen():
for input_value in samples:
sample_rate, audio_data =, 'rb')
audio_data = np.array(audio_data)
splitted_audio_data = tf.signal.frame(audio_data, input_size, input_size, pad_end=True, pad_value=0) / tf.int16.max #normalization in [-1,+1] range
yield [np.float32(splitted_audio_data[0])]
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_path)
converter.experimental_new_converter = True #if you're using tensorflow<=2.2
converter.optimizations = [tf.lite.Optimize.DEFAULT]
#converter.inference_input_type = tf.uint8 # or tf.uint8
#converter.inference_output_type = tf.uint8 # or tf.uint8
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
open(saved_model_path + "converted_model.tflite", "wb").write(tflite_model)
As you can see, the lines which tell the converter to change input/output type are commented. This is because Yamnet model expects in input normalized values of audio sample in the range [-1,+1] and the numerical representation must be float32 type. In fact the compiled model of Yamnet you posted uses the same dtype for input and output layers (float32).
That being said you will end up with a tflite model converted from the full tensorflow model produced by tflite-model-maker. The script will end with the following line:
fully_quantize: 0, inference_type: 6, input_inference_type: 0, output_inference_type: 0
and the inference_type: 6 tells you the inference operations are suitable for being compiled to coral edgetpu.
The last step is to compile the model. If you compile the model with the standard edgetpu_compiler command line :
edgetpu_compiler -s converted_model.tflite
the final model would have only 4 operations which run on the EdgeTPU:
Number of operations that will run on Edge TPU: 4
Number of operations that will run on CPU: 53
You have to add the optional flag -a which enables multiple subgraphs (it is in experimental stage though)
edgetpu_compiler -sa converted_model.tflite
After this you will have:
Number of operations that will run on Edge TPU: 44
Number of operations that will run on CPU: 13
And most of the model operations will be mapped to edgetpu, namely:
Operator Count Status
MUL 1 Mapped to Edge TPU
DEQUANTIZE 4 Operation is working on an unsupported data type
SOFTMAX 1 Mapped to Edge TPU
GATHER 2 Operation not supported
COMPLEX_ABS 1 Operation is working on an unsupported data type
LOG 1 Operation is working on an unsupported data type
CONV_2D 14 Mapped to Edge TPU
RFFT2D 1 Operation is working on an unsupported data type
LOGISTIC 1 Mapped to Edge TPU
QUANTIZE 3 Operation is otherwise supported, but not mapped due to some unspecified limitation
DEPTHWISE_CONV_2D 13 Mapped to Edge TPU
MEAN 1 Mapped to Edge TPU
STRIDED_SLICE 2 Mapped to Edge TPU
PAD 2 Mapped to Edge TPU
RESHAPE 1 Operation is working on an unsupported data type
RESHAPE 6 Mapped to Edge TPU

Determine tag-sets in Tensorflow Hub saved model

I'm trying to determine if this Tensorflow Hub model can be converted to TFLITE format (and eventually compiled for the TPU/Coral Board), by doing something like this.
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model("./")
tflite_model = converter.convert()
However, I need to specify the model tag-sets and this command gives no results (in both TF 1.13.1 and 2.0):
% saved_model_cli show --dir .
The given SavedModel contains the following tag-sets:
The saved_model.pb file is in this directory, and Netron is too unwieldy - given the size of the model it barely opens - so it's difficult to inspect. The model can be opened for inference: detector = hub.load(module_url).signatures['default'] so perhaps I can show the model summary from the detector object (?).
Any ideas how I can determine the model structure?
Any insight into the practicality of converting this model to TFLITE and then compiling for the TPU would be appreciated.

Converting DeepLab to TensorFlow Lite

I am trying to convert DeepLab trained on the Cityscapes dataset from here to TFLite. From viewing the frozen graph in Netron, the input and output tensors both are of type uint8. I was able to use the default DeepLab model provided for the TFLite GPU delegate, which had float32 input and output tensors. I didn't think the model was supposed to be quantized, so when trying the following code without the commented lines, I got this error:
F tensorflow/lite/toco/] Check failed: array.data_type == array.final_data_type Array "ImageTensor" has mis-matching actual and final data types (data_type=uint8, final_data_type=float).
After this, I found that I should try to quantize the model. I inserted the commented lines to use uint8 instead of float32, but I got this error, which seems like an unsupported op.
F ./tensorflow/lite/toco/toco_tooling.h:38] Check failed: s.ok() Unimplemented: this graph contains anoperator of type Cast for which the quantized form is not yet implemented. Sorry, and patches welcome (that's a relatively fun patch to write, mostly providing the actual quantized arithmetic code for this op).
Is it right to use the quantized script? The off-the-shelf TFLite DeepLab model provided uses float32. Thanks!

Using model optimizer for tensorflow slim models

I am aiming to inference tensorflow slim model with Intel OpenVINO optimizer. Using open vino docs and slides for inference and tf slim docs for training model.
It's a multi-class classification problem. I have trained tf slim mobilnet_v2 model from scratch (using sript Evaluation of trained model on test set gives relatively good results to begin with (using script
However, single .ckpt file is not saved (even though at the end of run there is a message like "model.ckpt is saved to checkpoint_dir"), there are 3 files (, .ckpt-180000.index, .ckpt-180000.meta) instead.
OpenVINO model optimizer requires a single checkpoint file.
According to docs I call with following params:
python --input_model D:/model/mobilenet_v2_224.pb --input_checkpoint D:/model/model.ckpt-180000 -b 1
It gives the error (same if pass --input_checkpoint D:/model/model.ckpt):
[ ERROR ] The value for command line parameter "input_checkpoint" must be existing file/directory, but "D:/model/model.ckpt-180000" does not exist.
Error message is clear, there are not such files on disk. But as I know most tf utilities convert .ckpt-????.meta to .ckpt under the hood.
Trying to call:
python --input_model D:/model/mobilenet_v2_224.pb --input_meta_graph D:/model/model.ckpt-180000.meta -b 1
[ ERROR ] Unknown configuration of input model parameters
It doesn't matter for me in which way I will transfer graph to OpenVINO intermediate representation, just need to reach that result.
Thanks a lot.
I managed to run OpenVINO model optimizer on frozen graph of tf slim model. However I still have no idea why had my previous attempts (based on docs) failed.
you can try converting the model to frozen format (.pb) and then convert the model using OpenVINO.
.ckpt-meta has the metagraph. The computation graph structure without variable values.
the one you can observe in tensorboard.
.ckpt-data has the variable values,without the skeleton or structure. to restore a model we need both meta and data files.
.pb file saves the whole graph (meta+data)
As per the documentation of OpenVINO:
When a network is defined in Python* code, you have to create an inference graph file. Usually, graphs are built in a form that allows model training. That means that all trainable parameters are represented as variables in the graph. To use the graph with the Model Optimizer, it should be frozen.
the OpenVINO optimizes the model by converting the weighted graph passed in frozen form.

TF Object Detection API - Trouble running quantized network after freezing and quantizing my fine-tuned network

TensorFlow Object Detection API
Using the TensorFlow Object Detection API to retrain MobileNet on my own DataSet. The issue occurs as I try to run my inference graph that has been both frozen and quantized.
Ubuntu 16.04,
TensorFlow 1.2 (from source, CPU only),
Bazel 0.4.5
Use provided frozen_graph.pb from model zoo.
Quantize to 8-bit using
Run inference
This works, however,
Re-train and produce my own frozen_graph.pb using object_detection/
Quantize to 8-bit using bazel-bin/tensorflow/tools/graph_transforms/transform_graph
Run inference <-- Produces error
Does NOT work, and the error I'm getting during the attempt to run the graph is:
line 1298, in _do_call
raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: The node
'Preprocessor/map/while/ResizeImage/ResizeBilinear/eightbit' has
inputs from different frames. The input
'Preprocessor/map/while/ResizeImage/size' is in frame
'Preprocessor/map/while/Preprocessor/map/while/'. The input
is in frame ''.
Since I can quantize and run the provided frozen_graph.pb the issue has to be with the export tool? Which export tool was used to create the frozen_graph.pb that are in the model zoo? Or how was the export tool called?
Quote from comments in export_inference_graph.pb, assuring me that it should produce a frozen graph if checkpoint is provided.
"Optionally, one can freeze the graph by converting the weights in the provided
checkpoint as graph constants thereby eliminating the need to use a checkpoint
file during inference."