TensorFlow Servering returns incorrect results - tensorflow

I have 2 different models (Model A is Keras .h5 and Model B is Torch .pth). They need to be served with TFServing. I converted both of these models to Tensorflow (with index .pb) for serving.
I succeeded to serve and get the outputs, but when I compared serving results with the straight model's outputs (on Keras and Torch model), I found that it had made wrong results. The prediction's score for the same image on the server-side is more unreliable than in model output. I could not understand, whether it raises from faults in model converting or anything else?
How could I fix it?

The reason of different results is due to different default parameters of layers and optimizer. For example in pytorch decay-rate of batch-norm is considered as 0.9, whereas in keras it is 0.99. Like that, there may be other variation in default parameters.
I would also recommend to check the weight initializations, as they might be different between both frameworks. Thank you!

Related

Dealing with missing values in tensorflow

I need some guidance on the approach to imputation in tensorflow/deep learning. I am familiar with how scikit-learn handles imputation, and when I map it to the tensorflow ecosystem, I would expect to use preprocessing layers in keras or functions in tensorflow transform to do the imputation. However, at least to my knowledge, these functions do not exist. So I have a few questions:
Is there a reason tied to how deep learning works that these functions do not exist (for example, dense sampling needs to be as accurate as possible, and you have a large amount of data, hence imputation is never required)
If it is not #1, how should one handle imputation in tensorflow? For example, during serving, your input could be missing data, and there's nothing you can do about that. I would think integrating it into preprocessing_fn would be the thing to do.
Is it possible to have the graph do different things during training and serving? For example, train on no missing values data, and if during serving you encounter that situation, do something like ignore that value or set it to a specified default.
Thank you!
Please refer to Mean imputation for missing data to impute missing values from your data with mean.
In the example below, x is a feature, represented as a tf.SparseTensor in the preprocessing_fn. In order to convert it to a dense tensor, we compute its mean, and set the mean to be the default value when it is missing from an instance.
Answering your third question, TensorFlow Transform builds transformations into the TensorFlow graph for your model so the same transformations are performed at training and inference time.
For your mentioned use-case, the below example for imputation would work, because default_value param sets values for indices if not specified. And if default_value param is not set, it defaults to Zero.
Example Code:
def preprocessing_fn(inputs):
return {
'x_out': tft.sparse_tensor_to_dense_with_shape(
inputs['x'], default_value=tft.mean(x), shape=[None, 1])
}

DIfferent optimization with different TF versions

I'm trying to train a convolutional neural network with keras and Tensorflow version 2.6, also I did it with Tensorflow version 1.11. I think that I did the migration okey (two neural networks converged) but when I see the results they are very different, worst in TF2.6, I used an optimizer Adam for both cases with the same hyperparameters (learning_rate = 0.001) but the optimization in the loss function in TF1.11 is better than in TF2.6
I'm trying to find out where the differences could be. What things must be taken into account when we work with differents TF versions? Can have important numerical differences? I know that in TF1.x the default mode is graph and in TF2 the default is eager, I don't know if this could bring different behavior in the training.
It surprises me how much the loss function is reduced in the first epochs reaching a lower value at the end of the training.
you understand that is correct they are working in different working modes eager and graph but the loss Fn is defined by how much change of value to required optimized pointed calculated by your or configured method.
You cannot directly be compared one model training history to another directly, running it several time you experience TF 1 is faster and smaller in the number of losses in the loss Fn that is needed to review the changelog Changlog
Loss Fn are updated, the graph is the powerful technique we know but TF 2.x supports access of the value at its level, why you have easy delegated methods such as callback, dynamic FNs, and working update value runtime. ( Trends to understand and experiments for student or user compared by both versions on the same tasks )
Symetrics in methods not create different results.

Set batch size of trained keras model to 1

I am having a keras model trained on my own dataset. However after loading weights the summary shows None as the first dimension(the batch size).
I want to know the process to fix the shape to batch size of 1, as it is compulsory for me to fix it so i can convert the model to tflite with GPU support.
What worked for me was to specify batch size to the Input layer, like this:
input = layers.Input(shape=input_shape, batch_size=1, dtype='float32', name='images')
This then carried through the rest of the layers.
The bad news is that despite this "fix" the tfl runtime still complains about dynamic tensors. I get these non-fatal errors in logcat when it runs:
E/tflite: third_party/tensorflow/lite/core/subgraph.cc:801 tensor.data.raw != nullptr was not true.
E/tflite: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#26 is a dynamic-sized tensor).
E/tflite: Ignoring failed application of the default TensorFlow Lite delegate indexed at 0.
The good news is that despite these errors it seems to be using the GPU anyway, based on performance testing.
I'm using:
tensorflow-lite-support:0.2.0'
tensorflow-lite-metadata:0.2.1'
tensorflow-lite:2.6.0'
tensorflow:tensorflow-lite-gpu:2.3.0'
Hopefully, they'll fix the runtime so it doesn't matter whether the batch size is 'None'. It shouldn't matter for doing inference.

Wildly different quantization performance on tensorflow-lite conversion of keras-trained DenseNet models

I have two models that I have trained using Keras. The two models use the same architecture (the DenseNet169 implementation from keras_applications.densenet package), however they each have a different number of target classes (80 in one case, 200 in the other case).
Converting both models to .pb format works just fine (identical performance in inference). I use the keras_to_tensorflow utility found at https://github.com/amir-abdi/keras_to_tensorflow
Converting both models to .tflite format using TOCO works just fine (again, identical performance in inference).
Converting the 80-class model to .tflite using quantization in TOCO works reasonably well (<1% drop in top 3 accuracy).
Converting the 200-class model to .tflite using quantization in TOCO goes off the rails (~30% drop in top 3 accuracy).
I'm using an identical command-line to TOCO for both of the models:
toco --graph_def_file frozen_graph.pb \
--output_file quantized_graph.tflite \
--inference_type FLOAT \
--inference_input_type FLOAT \
--output_format TFLITE \
--input_arrays input_1 \
--output_arrays output_node0 \
--quantize True
My tensorflow version is 1.11.0 (installed via pip on macOS Mojave, although I have also tried the same command/environment on the Ubuntu machine I use for training with identical results).
I'm at a complete loss as to why the accuracy of inference is so drastically affected for one model and not the other. This holds true for many different trainings of the same two architecture/target class combinations. I feel like I must be missing something, but I'm baffled.
This was intended to be just a small sneaky comment since i'm not sure if this can help, but then it got so long that I decided to make it an answer...
My wild guess is that the accuracy drop may be caused by the variance of the output of your network. After quantization (btw, tensorflow uses fixed-point quantization), you are playing with only 256 points (8 bit) instead of the full dense range of float32.
On most of the blogs around the web, it is stated that the main assumption of Quantization is that weights and activations tends to lie in a small range of values. However, there is an implicit assumption that is less talked about in blogs and literature: the activations of the network on a single sample should be decently spread across the quantized range.
Consider the following scenario where the assumption holds place (a histogram of activations on single sample at specific layer, and the vertical lines are quantization points):
Now consider the scenario where the second assumption is not true, but the first assumption can still hold place (the blue is overall value distribution, gray is for given sample, vertical strips are quantization points):
In the first scenario, the distribution for the given sample is covered well (with a lot of quant points). In the second, only 2 quant points. The similar thing can happen to your network as well: maybe for 80 classes it still have enough quantization points to distinguish, but with 200 classes we might not have enough...
Hey, but it doesn't affect MobileNet with 1000 classes, and even MobileNetV2, which is residual?
That's why I called it "a wild guess". Maybe MobileNet and MobileNetV2 does not have such a wide output variance as DenseNet. The former only have one input at each layer (which is already normalized by BN), while DenseNet have connections all over the places so it can have larger variance as well as sensitivity to small changes, and BN might not help as much.
Now, try this checklist:
Manually collect activation statistics of both 80 and 200 models on TensorFlow, not only the outputs but inner layers as well. Is the values focused in one area or it spreads out widely?
See if single-input activations of the TensorFlow model spreads out nicely, or we may have some issues with it concentrating in one place?
Most importantly: see what are the outputs of the Quantized TF-Lite model? If there are problems with the variance as described above, here is where it will show itself the most.
PS: please share your results as well, I think many will be interested in troubleshooting quantization issues :)

Tensorflow serving, get different outcome

I am using tensorflow serving to serve a pre-trained model.
The strange thing is when I input same data for this model, I got different outcome each time.
I thought it might be my problem at variable initialize, I am wondering is there any clue I debug my model, or how can I find the cause, thanks.
Two common problems:
There's a known issue with main_op in which variables are re-initialized to random.
You left dropout layers in your prediction graph.
To address (1), use this instead:
def main_op():
init_local = variables.local_variables_initializer()
init_tables = lookup_ops.tables_initializer()
return control_flow_ops.group(init_local, init_tables)
To address (2), be sure that you aren't directly exporting your training graph. You need to build a new graph for prediction/serving. If you are using the tf.estimator framework, then you will only conditionally add dropout layers when mode is tf.estimator.ModeKeys.TRAIN.