Tensorflow serving, get different outcome - tensorflow-serving

I am using tensorflow serving to serve a pre-trained model.
The strange thing is when I input same data for this model, I got different outcome each time.
I thought it might be my problem at variable initialize, I am wondering is there any clue I debug my model, or how can I find the cause, thanks.

Two common problems:
There's a known issue with main_op in which variables are re-initialized to random.
You left dropout layers in your prediction graph.
To address (1), use this instead:
def main_op():
init_local = variables.local_variables_initializer()
init_tables = lookup_ops.tables_initializer()
return control_flow_ops.group(init_local, init_tables)
To address (2), be sure that you aren't directly exporting your training graph. You need to build a new graph for prediction/serving. If you are using the tf.estimator framework, then you will only conditionally add dropout layers when mode is tf.estimator.ModeKeys.TRAIN.

Related

TensorFlow Servering returns incorrect results

I have 2 different models (Model A is Keras .h5 and Model B is Torch .pth). They need to be served with TFServing. I converted both of these models to Tensorflow (with index .pb) for serving.
I succeeded to serve and get the outputs, but when I compared serving results with the straight model's outputs (on Keras and Torch model), I found that it had made wrong results. The prediction's score for the same image on the server-side is more unreliable than in model output. I could not understand, whether it raises from faults in model converting or anything else?
How could I fix it?
The reason of different results is due to different default parameters of layers and optimizer. For example in pytorch decay-rate of batch-norm is considered as 0.9, whereas in keras it is 0.99. Like that, there may be other variation in default parameters.
I would also recommend to check the weight initializations, as they might be different between both frameworks. Thank you!

Function CNN model written in Keras 2.2.4 not learning in TensorFlow or Keras 2.4

I am dealing with an object detection problem and using a model which is actually functioning (its results have been published on a paper and I have the original code). Originally, the code was written with Keras 2.2.4 without importing TensorFlow and trained and tested on the same dataset that I am using at the moment. However, when I try to run the same model with TensorFlow 2.x it just won't learn a thing.
I have tried importing everything from TensorFlow 2.4, but I have the same problem if I import everything (layers, models, optimizers...) from Keras 2.4. And I have tried to do so on two different devices, both using a GPU. Namely, what is happening is that the loss function decreases ridiculously fast, but the accuracy won't increase a bit (or, if it does, it gets stuck around 10% or smth). Also, every now and then this happens from an epoch to the next one:
Loss undergoes HUGE jumps between consecutive epochs, and all this without any changes in accuracy
I have tried to train the network on another dataset (had to change the last layers in order to match the required dimensions) and the model seemed to be learning in a normal way, i.e. the accuracy actually increases and the loss doesn't reach 0.0x in one epoch.
I can't post the script, but the model is an Encoder-Decoder network: consecutive Convolutions with increasing number of filters reduce the dimensions of the image, and a specular path of Transposed Convolutions restores the original dimensions. So basically the network only contains:
Conv2D
Conv2DTranspose
BatchNormalization
Activation("relu")
Activation("sigmoid")
concatenate
6 is used to put together outputs from parallel paths or distant layers; 3 and 4 are used after every Conv or ConvTranspose; 5 is only used as final activation function, i.e. as output layer.
I think the problem is pretty generic and I am honestly surprised that I couldn't find a single question about it. What could be happening here? The problem must have something to do with TF/Keras versions, but I can't find any documentation about it and I have been trying to change so many things but nothing changes. It's crazy because if I didn't know that the model works I would try to rewrite it from scratch so I am afraid that this problem may occurr with a new network and I won't be able to understand whether it's the libraries or the model itself.
Thank you in advance! :)
EDIT
Code snippets:
Convolutional block:
encoder1 = Conv2D(filters=first_layer_channels, kernel_size=2, strides=2)(input)
encoder1 = BatchNormalization()(encoder1)
encoder1 = Activation('relu')(encoder1)
Decoder
decoder1 = Conv2DTranspose(filters=first_layer_channels, kernel_size=2, strides=2)(encoder4)
decoder1 = BatchNormalization()(decoder1)
decoder1 = Activation('relu')(decoder1)
Final layers:
final = Conv2D(filters=total, kernel_size=1)(decoder4)
final = BatchNormalization()(final)
Last_Conv = Activation('sigmoid')(final)
The task is human pose estimation: the network (which, I recall, works on this specific task with Keras 2.2.4) has to predict twenty binary maps containing the positions of specific keypoints.

Training Tensorflow only one object

Corresponding Tensorflow documentation I trained 3 objects and get result (It can recognize these objects). When I show other objects (not the 3 ones) it doesn't work correctly.
I want to train only one object (example: a cup) and recognize only this object. Is it possible to do via Tensorflow ?
Your question doesn't provide enough details, but as I can guess your trained the network with softmax activation and Categorical or SparseCategorical cross entropy loss. If my guess is right, such network always generates prediction to one of three classess, regardless to actual data, i.e. there is no option of "no-one".
In order to train network to recognize only one class of objects, make the only one output with only one channel and sigmoid activation. Use BinaryCrossEntropy loss to train your model for the specific object. Provide dataset that includes examples with this object and without it.

Using different optimizers to train the same layer in tensorflow

I have a model which consists of convolutional layers followed by fully connected layers. I trained this model on the fer dataset. This is considered a classification problem where the number of output is equal to 8.
After training this model, I kept the fully connected layer, and replaced only the last layer with a new one that has 3 outputs. Therefore, the purpose was to fine tune the fully connected layers along with training the output layer.
Therefore, I have used an optimizer at the beginning to train the whole model. Then I created a new optimizer to fine tune the fully connected layer along with training the last layer.
As a result, I got the following error:
ValueError: Variable Dense/dense/bias/Adam/ already exists,
I know the reason for getting error. The second optimizer was trying to create a kernel for updating the weights using the same name; because a kernel with the same name was created by the first optimizer.
Hence, I would like to know how to fix this problem. Is there a way to delete the kernels associated with the first optimizer?
Any help is much appreciated!!
This is probably caused by both optimizers using the (same) default name 'Adam'. To avoid this clash, you can give the second optimizer a different name, e.g.
opt_finetune = tf.train.AdamOptimizer(name='Adam_finetune')
This should make opt_finetune create its variables under different names. Please let us know whether this works!

How to structure the model for training and evaluation on the test set

I want to train a model. Every 1000 steps, I want to evaluate it on the test set and write it to the tensorboard log. However, there's a problem. I have a code like this:
image_b_train, label_b_train = tf.train.shuffle_batch(...)
out_train = model.inference(image_b_train)
accuracy_train = tf.reduce_mean(...)
image_b_test, label_b_test = tf.train.shuffle_batch(...)
out_test = model.inference(image_b_test)
accuracy_test = tf.reduce_mean(...)
where model inference declares the variables in the model. However, there's a problem. For the test set I have a separate queue, and I can't swap one queue for another with tensorflow.
Currently I solved the problem by creating 2 graphs, one for training and the other for testing. I copy from one graph to the other with tf.train.Saver. Another solution might be to use tf.get_variable, but this is a global variable, and I don't like it because my code becomes less reusable.
Yes, you need two graphs. These graphs can share variables. This can be done by:
Using Keras layers (from tf.contrib.keras) which let you define the model once and use it to compute two inference graphs
Using slim-style layers (from tf.layers) with tf.get_variable and reuse
Using tf.make_template to make your own model-like object which can be called once to build the training graph and once to build the inference graph
Using tf.estimator.Estimator which lets you define a model function once and runs it automatically for training and evaluation for you
There are other options, but any of these is well-supported and should unblock you.