Export MXNet model to ONNX with _contrib_MultiBoxPrior Error - mxnet

I created an object detection model in AWS SageMaker, based on SSD/ResNet50 and in MXNet.
Now I would like to optimize it in TensorRT, for which I need to export to ONNX as a first step.
Looking for any recommendation on converting _contrib_MultiBoxPrior to a supported symbol didn't yield any result for me.
Basic code
input_shape = (1, 3, 512, 512)
converted_model_path = onnx_mxnet.export_model(sym_file, params_file, [input_shape], np.float32, onnx_file)
The exact error message is
"AttributeError: No conversion function registered for op type _contrib_MultiBoxPrior yet."
What is the recommended way to solve this error?

The implementation of the MultiBoxPrior operator is dependent on ONNX supporting it. You can track the issue here: https://github.com/apache/incubator-mxnet/issues/15181
Alternatively you can try using mxnet-tensorrt. It uses the subgraph API which means that the symbol that can be executed in TensorRT are executed in the TensorRT runtime, and the ones that cannot are executed in the MXNet runtime.
https://mxnet.incubator.apache.org/versions/master/tutorials/tensorrt/inference_with_trt.html
Note that the current version of this tutorial is for the 1.3.0 version of MXNet I believe. An update is coming for the next release with a simpler API and better performance.

Related

How to use legacy_seq2seq for TensorFlow 2?

I am new to TensorFlow and I am wanting to use tensorflow.config.legacy_seq2se, specifically embedding_rnn_seq2seq() and I can't figure out how to use it (or if there is an equivalent method) for TensorFlow 2.
I know that in TensorFlow 2, TensorFlow removed contrib and according to this document
tf.contrib.legacy_seq2seq has been deleted and replaced with tf.seq2seq in TensorFlow 2, but I can't find embedding_rnn_seq2seq() in the tf.seq2seq documentation I have seen.
The reason I want to use it is I am trying to implement something similar to what is done with embedding_rnn_seq2seq() in this article. So is there an equivalent in tensorflow 2, or is there a different way to achieve the same goal?
According to https://docs.w3cub.com/tensorflow~python/tf/contrib/legacy_seq2seq/embedding_rnn_seq2seq , contrib.legacy_rnn_seq2seq createsan embedding of an argument that you pass, encoder_inputs (the shape is num_encoder_symbols x input_size). It then runs an RNN to encode the embedded encoder_inputs to convert it into a state vector. Then it embeds another argument you pass decoder_inputs (the shape is num_decoder_symbols x input_size). Next it runs an RNN decoder initialized with with the last encoder state, on the embedded decoder_inputs.
Contrib was a community maintained part of Tensorflow, and seq2seq was part of it. In Tensorflow 2 it was removed.
You could just use a Tensorflow_addons which contains community made add ons including seq2seq I believe.
You can import Tensorflow add ons via
import tensorflow_addons
Or you could just use a Tensorflow version that still has Seq2Seq (I believe 1.1 is the latest).
There are also things like bi-directional recurrent neural networks and dynamic RNNs (they are basically a new version of seq2seq) that may work.

TF-Lite Non-Max-Suppression

I am attempting to convert a graph with tf.image.non_max_suppression or tf.image.combined_non_max_suppression but both API calls yield an error like "tf.CombinedNonMaxSuppression op is neither a custom op nor a flex op." My setup is TF2.3.1, python 3.7, Windows 10.
I understand that some tf functions are not supported for conversion to TF-Lite but the link below shows tfl function for non-max-suppression.
https://tensorflow.google.cn/mlir/tfl_ops#tflnon_max_suppression_v4_tflnonmaxsuppressionv4op
What do I need to do to be able to run the converter on my function in order to use the tfl.non_max_suppression_vx function?
Non-max-suppression operation is not supported in TF Lite. If you want to use them you have to use via TF lib, when converting, adding these lines
converter.experimental_new_converter=True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
There is an other way that you should rewrite NMS op by ops supported in TF Lite or sort of so. If you rewrite successfully, please tell me. Thanks.

How to use feature_column v2 in Tensorflow (TF-Ranking)

I'm using TF-Ranking to train a recommendation engine. I have encountered a problem that seems to be a version incompatibility issue concerning tf.feature_column API.
The short version of my question is: What is a v2 feature column (TF 2.0?) (see this for instance) and how can I ensure that my feature columns are treated as v2, while I'm still using TF 1.14.
Here is the details:
I'm unable to shorten my code sufficiently to provide a reproducible example. But I will try to describe the problem in words.
TF Version: 1.14
OS: Ubuntu 18.04
I initialy had two features in my model, user and item, both sparse categorical features which were wrapped in their own tf.feature_column.embedding_column. I was able to use the train_and_evaluate method of the Estimator and export the model for serving.
Then I added a new feature curr_item which is only present during prediction (as a context feature). This shares the embeddings with item. So now I have a tf.feature_column.shared_embedding_columns which wraps both item and current_item.
Now calling train_and_evaluate results in the following error (shortened messages):
ValueError: Could not load all requested variables from checkpoint. Please make sure your model_fn does not expect variables that were not saved in the checkpoint.
Key input_layer/user_embedding/embedding_weights not found in checkpoint
Note that calling train method only works fine. My understanding is that once it gets to evaluation, it tries to load the variables from the checkpoint, but that variable doesn't exist. I did a little debugging and found the reason:
When encode_listwise_features is called during training (which in turn calls encode_features) all features (user and item) are "V2" (not sure what that means) and so the following if statement holds:
https://github.com/tensorflow/ranking/blob/31fc134816cc4974a46a11e7bb2df0066d0a88f0/tensorflow_ranking/python/feature.py#L92
and both variables are named with an encoding_layer prefix (scope name?):
encoding_layer/user_embedding/embedding_weights
encoding_layer/item_embedding/embedding_weights
But when I call the same function for all three features (a little confused wether this is in eval or predict mode), some of these are not "V2" and we end up in the else part of the above condition which calls input_layer direcetly and variables are named using input_layer prefix. Now TF is trying to restore
input_layer/user_embedding/embedding_weights
from the check-point, but that name doesn't exist in the checkpoint, because it was called
encoding_layer/user_embedding/embedding_weights
in training.
So:
1) How can I ensure that all my features are treated as v2 at all stages? I tried using tf.compat.v2.feature_column but that didn't help. There is already a ToDo note above that if statement for this.
2) Can the encode_feature be modified to avoid this situation? e.g. raise an exception with a helpful message?

"Unkown (custom) loss function" when using tflite_convert on a {TF 2.0.0-beta1 ; Keras} model

Summary
My question is composed by:
A context in which I present my project, my working environment and my workflow
The detailed problem
The concerned parts of my code
The solutions I tried to solve my problem
The question reminder
Context
I've written a Python Keras implementation of a downgraded version of the original Super-Resolution GAN. Now I want to test it using Google Firebase Machine Learning Kit, by hosting it in the Google servers. That's why I have to convert my Keras program to a TensorFlow Lite one.
Environment and workflow (with the problem)
I'm training my program on Google Colab working environment: there, I've installed TF 2.0.0-beta1 (this choice is motivated by this uncorrect answer: https://datascience.stackexchange.com/a/57408/78409).
Workflow (and problem):
I write locally my Python Keras program, keeping in mind that it will run on TF 2. So I use TF 2 imports, for example: from tensorflow.keras.optimizers import Adam and also from tensorflow.keras.layers import Conv2D, BatchNormalization
I send my code to my Drive
I run without any problem my Google Colab Notebook: TF 2 is used.
I get the output model in my Drive, and I download it.
I try to convert this model to the TFLite format by executing the following CLI: tflite_convert --output_file=srgan.tflite --keras_model_file=srgan.h5: here the problem appears.
The problem
Instead of outputing the TF Lite converted model from the TF (Keras) model, the previous CLI outputs this error:
ValueError: Unknown loss function:build_vgg19_loss_network
The function build_vgg19_loss_network is a custom loss function that I've implemented and that must be used by the GAN.
Parts of code that rise this problem
Presenting the custom loss function
The custom loss function is implemented like that:
def build_vgg19_loss_network(ground_truth_image, predicted_image):
loss_model = Vgg19Loss.define_loss_model(high_resolution_shape)
return mean(square(loss_model(ground_truth_image) - loss_model(predicted_image)))
Compiling the generator network with my custom loss function
generator_model.compile(optimizer=the_optimizer, loss=build_vgg19_loss_network)
What I've tried to do in order to solve the problem
As I read it on StackOverflow (link at the beginning of this question), TF 2 was thought to be sufficient to output a Keras model which would be correctly processed by my tflite_convert CLI. But it's not, obviously.
As I read it on GitHub, I tried to manually set my custom loss function among Keras' loss functions, by adding these lines: import tensorflow.keras.losses
tensorflow.keras.losses.build_vgg19_loss_network = build_vgg19_loss_network. It didn't work.
I read on GitHub I could use custom objects with load_model Keras function: but I only want to use compile Keras function. Not load_model.
My final question
I want to do only minor changes to my code, since it works fine. So I don't want, for example, to replace compile with load_model. With this constraint, could you help me, please, to make my CLI tflite_convert works with my custom loss function?
Since you are claiming that TFLite conversion is failing due to a custom loss function, you can save the model file without keep the optimizer details. To do that, set include_optimizer parameter to False as shown below:
model.save('model.h5', include_optimizer=False)
Now, if all the layers inside your model are convertible, they should get converted into TFLite file.
Edit:
You can then convert the h5 file like this:
import tensorflow as tf
model = tf.keras.models.load_model('model.h5') # srgan.h5 for you
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
Usual practice to overcome the unsupported operators in TFLite conversion is documented here.
I had the same error. I recommend changing the loss to "mse" since you already have a well-trained model and you don't need to train with the .tflite file.

Error with 8-bit Quantization in Tensorflow

I have been experimenting with the new 8-bit quantization feature available in TensorFlow. I could run the example given in the blog post (quantization of googlenet) without any issue and it works fine for me !!!
Now, I would like to apply the same for a simpler network. So I used a pre-trained network for CIFAR-10 (which is trained on Caffe), extracted its parameters, created corresponding graph in tensorflow, initialized the weights with this pre-trained weights and finally saved it as a GraphDef object. See this IPython Notebook for full procedure.
Now I applied the 8-bit quantization with the tensorflow script as mentioned in the Pete Warden's blog:
bazel-bin/tensorflow/contrib/quantization/tools/quantize_graph --input=cifar.pb --output=qcifar.pb --mode=eightbit --bitdepth=8 --output_node_names="ArgMax"
Now I wanted to run the classification on this quantized network. So I loaded the new qcifar.pb to a tensorflow session and passed the image (the same way I passed it to original version). Full code can be found in this IPython Notebook.
But as you can see at the end, I am getting following error:
NotFoundError: Op type not registered 'QuantizeV2'
Can anybody suggest what am I missing here?
Because the quantized ops and kernels are in contrib, you'll need to explicitly load them in your python script. There's an example of that in the quantize_graph.py script itself:
from tensorflow.contrib.quantization import load_quantized_ops_so
from tensorflow.contrib.quantization.kernels import load_quantized_kernels_so
This is something that we should update the documentation to mention!