Is it possible to set the scale to 16 bit while converting a tensorflow model to tflite (8-bit quantization)? - tensorflow

I have a tensorflow model which need to be quantized to 8-bit. According to the quantization spec, float values are approximated to integer value using the formula
real_value = (int8_value - zero_point) x scale
After the quantization, when inferences are run, I see in the convolution (or) depth-wise convolution op, that the scale is quantized to 32-bit. Is there an option during training the model or post training to limit the scale to 16-bit?

Related

How to use the quantized Tensorflow MobileNet v1 floating point scaling values

There are quantized MobileNet v1 models available at https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
I see floating point scaling values associated with the weights and biases in the model, but it isn't evident how these should be used in the operations scaling.
The GEMMLOWP quantization info describes scaling values associated with input, weight and the operation's accumulator downscale.
Should the bias scaling value be used alone for down-scaling the accumulator, or is the weight scaling value required?
In short, I'm trying to determine how the two provided scaling values should be used.
Thanks.

Expected validation accuracy for Keras Mobile Net V1 for CIFAR-10 (training from scratch)

Has anybody trained Mobile Net V1 from scratch using CIFAR-10? What was the maximum accuracy you got? I am getting stuck at 70% after 110 epochs. Here is how I am creating the model. However, my training accuracy is above 99%.
#create mobilenet layer
MobileNet_model = tf.keras.applications.MobileNet(include_top=False, weights=None)
# Must define the input shape in the first layer of the neural network
x = Input(shape=(32,32,3),name='input')
#Create custom model
model = MobileNet_model(x)
model = Flatten(name='flatten')(model)
model = Dense(1024, activation='relu',name='dense_1')(model)
output = Dense(10, activation=tf.nn.softmax,name='output')(model)
model_regular = Model(x, output,name='model_regular')
I used Adam optimizer with a LR= 0.001, amsgrad = True and batch size = 64. Also normalized pixel data by dividing by 255.0. I am not using any Data Augmentation.
optimizer1 = tf.keras.optimizers.Adam(lr=0.001, amsgrad=True)
model_regular.compile(optimizer=optimizer1, loss='categorical_crossentropy', metrics=['accuracy'])
history = model_regular.fit(x_train, y_train_one_hot,validation_data=(x_test,y_test_one_hot),batch_size=64, epochs=100) # train the model
I think I am supposed to get at least 75% according to https://arxiv.org/abs/1712.04698
Am I am doing anything wrong or is this the expected accuracy after 100 epochs. Here is a plot of my validation accuracy.
Mobilenet was designed to train Imagenet which is much larger, therefore train it on Cifar10 will inevitably result in overfitting. I would suggest you plot the loss (not acurracy) from both training and validation/evaluation, and try to train it hard to achieve 99% training accuracy, then observe the validation loss. If it is overfitting, you would see that the validation loss will actually increase after reaching minima.
A few things to try to reduce overfitting:
add dropout before fully connected layer
data augmentation - random shift, crop and rotation should be enough
use smaller width multiplier (read the original paper, basically just reduce number of filter per layers) e.g. 0.75 or 0.5 to make the layers thinner.
use L2 weight regularization and weight decay
Then there are some usual training tricks:
use learning rate decay e.g. reduce the learning rate from 1e-2 to 1e-4 stepwise or exponentially
With some hyperparameter search, I got evaluation loss of 0.85. I didn't use Keras, I wrote the Mobilenet myself using Tensorflow.
The OP asked about MobileNetv1. Since MobileNetv2 has been published, here is an update on training MobileNetv2 on CIFAR-10 -
1) MobileNetv2 is tuned primarily to work on ImageNet with an initial image resolution of 224x224. It has 5 convolution operations with stride 2. Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx7x7, where C is the number of filters (1280 for MobileNetV2).
2) For CIFAR10, I changed the stride in the first three of these layers to 1. Thus the GlobalAvgPool2D gets a feature map of Cx8x8. Secondly, I trained with 0.25 on the width parameter (affects the depth of the network). I trained with mixup in mxnet (https://gluon-cv.mxnet.io/model_zoo/classification.html). This gets me a validation accuracy of 93.27.
3) Another MobileNetV2 implementation that seems to work well for CIFAR-10 is available here - PyTorch-CIFAR
The reported accuracy is 94.43. This implementation changes the stride in the first two of the original layers which downsample the resolution to stride 1. And it uses the full width of the channels as used for ImageNet.
4) Further, I trained a MobileNetV2 on CIFAR-10 with mixup while only setting altering the stride in the first conv layer from 2 to 1 and used the complete depth (width parameter==1.0). Thus the GlobalAvgPool2D (penultimate layer) gets a feature map of Cx2x2. This gets me an accuracy of 92.31.

Float ops found in quantized TensorFlow MobileNet model

As you can see in the screenshot of a quantized MobileNet model implemented in TensorFlow, there are still some float operations. The quantization is done in TensorFlow via the graph_transform tools.
The red ellipse in the image has its description in the right-hand-size text box. The "depthwise" is a "DepthwiseConv2dNative" operation that expects "DT_FLOAT" inputs.
Despite the lower Relu6 performs an 8-bit quantized operation, the result has to go through "(Relu6)" which is a "Dequantize" op, in order to produce "DT_FLOAT" inputs for the depthwise convolution.
Why is depthwise conv operations left out by TF graph_transform tools? Thank you.
Unfortunately there isn't a quantized version of depthwise conv in standard TensorFlow, so it falls back to the float implementation with conversions before and after. For a full eight-bit implementation of MobileNet, you'll need to look at TensorFlow Lite, which you can learn more about here:
https://www.tensorflow.org/mobile/tflite/

Training quantized models in TensorFlow

I would like to train a quantized network, i.e. use quantized weights during the forward pass to calculate the loss and then update the underlying full-precision floating point weights during the backward pass.
Note that in my case "fake quantization" is sufficient. That means that the weights can still be stored as 32-bit floating point values, as long as they represent a low bitwidth quantized value.
In a blog post from Pete Warden he states:
[...] we do have support for “fake quantization” operators. If you include these in your graphs at the points where quantization is expected to occur (for example after convolutions), then in the forward pass the float values will be rounded to the specified number of levels (typically 256) to simulate the effects of quantization.
The mentioned operators can be found in the TensorFlow API.
Can anybody point out to me how to use these functions?
If I call them after e.g. a conv layer in my model definition, why would this quantize the weights in the layer instead of the outputs (activations) of this layer?

Quantize Embeddings

I would like to quantize embeddings to a single signed byte in each dimension. If I try to do this by scaling the values to [-127, 128], then casting to tf.int8, recasting to tf.float32 and rescaling to the original [-1, 1] range, I get the following error:
ValueError: No gradients provided for any variable
The same training script works fine without the quantization step.
According to this thread, quantized ops will be a future feature of tensorflow. In the mean time is there a good work-around for this simple quantization scenario?