Float ops found in quantized TensorFlow MobileNet model - tensorflow

As you can see in the screenshot of a quantized MobileNet model implemented in TensorFlow, there are still some float operations. The quantization is done in TensorFlow via the graph_transform tools.
The red ellipse in the image has its description in the right-hand-size text box. The "depthwise" is a "DepthwiseConv2dNative" operation that expects "DT_FLOAT" inputs.
Despite the lower Relu6 performs an 8-bit quantized operation, the result has to go through "(Relu6)" which is a "Dequantize" op, in order to produce "DT_FLOAT" inputs for the depthwise convolution.
Why is depthwise conv operations left out by TF graph_transform tools? Thank you.

Unfortunately there isn't a quantized version of depthwise conv in standard TensorFlow, so it falls back to the float implementation with conversions before and after. For a full eight-bit implementation of MobileNet, you'll need to look at TensorFlow Lite, which you can learn more about here:
https://www.tensorflow.org/mobile/tflite/

Related

Is it possible to set the scale to 16 bit while converting a tensorflow model to tflite (8-bit quantization)?

I have a tensorflow model which need to be quantized to 8-bit. According to the quantization spec, float values are approximated to integer value using the formula
real_value = (int8_value - zero_point) x scale
After the quantization, when inferences are run, I see in the convolution (or) depth-wise convolution op, that the scale is quantized to 32-bit. Is there an option during training the model or post training to limit the scale to 16-bit?

Upsampling Tensor for TensorRT

The tensorflow model is converted to TensorRT and Tensorflow's ResizeArea(upsample in the picture) need to implement plugin.
So ResizeArea is implemented in CUDA.
My TensorRT input is NCHW format.
uff_path = model_to_uff(model_path)
parser.register_input(ModelData.INPUT_NAME, (3, height, width), trt.UffInputOrder.NCHW)
parser.register_output(ModelData.OUTPUT_NAME)
parser.parse(uff_path, network)
So my CUDA code is implemented NCHW resampling.
I like to make sure my resampling format is correct.
Method_1
NCHW resizearea(4 times upsampling) sample.
channel_1 channel_2 channel_3
3,1,2,0, 0,4,3,1, 2,0,2,3,
3,0,1,2, 0,1,2,1, 2,0,4,2,
4,1,2,2, 1,3,2,4, 2,3,4,2,
channel_1 channel_2 channel_3
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,1,1,1,1,2,2,2,2,0,0,0,0, 0,0,0,0,4,4,4,4,3,3,3,3,1,1,1,1, 2,2,2,2,0,0,0,0,2,2,2,2,3,3,3,3,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
3,3,3,3,0,0,0,0,1,1,1,1,2,2,2,2, 0,0,0,0,1,1,1,1,2,2,2,2,1,1,1,1, 2,2,2,2,0,0,0,0,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
4,4,4,4,1,1,1,1,2,2,2,2,2,2,2,2, 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4,4, 2,2,2,2,3,3,3,3,4,4,4,4,2,2,2,2,
Each pixel is 4 times upsampled(for example, first pixel 3 is upsample 4 times horizontally and vertically). That is considered as NCHW format upsampling.
Method_2
Another way implemented is considered upsampling in NHWC format.
3-channel data (83,86,77) is upsampled horizontally and vertically.
Is Method_1 is correct way of NCHW upsampling?
It seems method 1 is ok, since TensorRT expects CHW as said in the docs, NHWC is a TF format, are you considering AlignCorners in your plugin layer? Also note that the resizing is nearest neighbor, in pytorch I used onnx-trt to do bilinear interpolation, which gave better results (in the case of segmentation, maybe for your case nn is ok).
After taking some time. The issue was solved. TensorRT works in NCHW format. Tensorflow model is in NHWC format. So in my plugin, need to work upsampling in NCHW format but output needs to change as NHWC format so that it can interface to next Tensorflow operations.

Fused activation functions of conv ops should be NONE but they are RELU6

I am trying to convert a frozen mobilenet_v1 graph to tflite. I'm interested in the conversion process and not just the quantized tflite model, so I'm comparing my converted model to one already made by tensorflow. The difference between them is that my conv2d nodes have a relu6 activation function while the correct model's conv2d nodes don't have any activation function. How should I go about fixing this problem?
I should also mention that I'm converting to a fully quantized tflite model by using tensorflow's quantization-aware training and activating the quantization flags upon conversion to tflite.

What is the difference between tensorflow inception and mobilenet

Recently i have been working with tensorflow inception V3 and mobileNet to deploy them for use in Android. While converting retrained model of inception V3 to "tflite" there some issues as the "tflite" model was empty, But when tried with retrained MobileNet model it was successfully converted into "tflite". So basically i have two questions
Is it possible to convert inception V3 retrained model to "tflite"?
What is the difference between inception V3 and MobileNet?
PS. I have gone through the official documentation link, which only hinted at mobileNet only being
https://www.tensorflow.org/tutorials/image_retraining#other_model_architectures
Yes both of the models can be converted to tflite format. For a step by step procedure please go through this link Convert to tflite.
The major difference between InceptionV3 and Mobilenet is that Mobilenet uses
Depthwise separable convolution while Inception V3 uses standard convolution.
This results into lesser number of parameters in MobileNet compared to InceptionV3. However, this results in slight decrease in the performance as well.
In a standard convolution the filter operates on the M channels of the input image all-together and outputs N feature maps i.e. the matrix multiplication between the input and filter is multidimensional. To make it clear take the filter as a cube of size Dk x Dk x M, then in standard convolution each element of the cube will multiply with the corresponding element in the input feature matrix and finally after the multiplication the feature maps will be added to output N feature maps.
However, in a depthwise separable convolution the M single channel filters will operate on a single cube in the input feature and once the M filter outputs are obtained a pointwise filter of size 1 x 1 x M will operate on it to give N output feature maps. This can be understood from the figure below from the MobileNet paper.
To make it more clear please go through the DataScienceLink.
They have a concrete example on how it reduces the parameters count which I am simply pasting here.
4

Training quantized models in TensorFlow

I would like to train a quantized network, i.e. use quantized weights during the forward pass to calculate the loss and then update the underlying full-precision floating point weights during the backward pass.
Note that in my case "fake quantization" is sufficient. That means that the weights can still be stored as 32-bit floating point values, as long as they represent a low bitwidth quantized value.
In a blog post from Pete Warden he states:
[...] we do have support for “fake quantization” operators. If you include these in your graphs at the points where quantization is expected to occur (for example after convolutions), then in the forward pass the float values will be rounded to the specified number of levels (typically 256) to simulate the effects of quantization.
The mentioned operators can be found in the TensorFlow API.
Can anybody point out to me how to use these functions?
If I call them after e.g. a conv layer in my model definition, why would this quantize the weights in the layer instead of the outputs (activations) of this layer?