tf.keras Functional model gives different results on the same data - tensorflow

I have defined my Functional model like this:
base_model = VGG16(include_top=False, input_shape=(224,224,3), pooling='avg')
inputs = tf.keras.Input(shape=(224,224,3))
x = preprocess_input(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.Dropout(0.2)(x, training=True)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
The problem is when I call .evaluate() or .predict() I get slightly different results everytime when using the exact same batch (with shuffle=False in my dataset, and all the random seeds initialized).
I tried reconstructing the model without some of the layers and I found the culprit to be these 2 layers constructed by the line x=preprocess_input(inputs), which give randomness to the results:
model summary
Note: preprocess_input is a vgg16 preprocessing function at tf.keras.applications.vgg16.preprocess_input.
However, if I redefine my Functional model as Sequential:
new_model = tf.keras.Sequential()
new_model.add(model.layers[0]) #input layer
new_model.add(tf.keras.layers.Lambda(preprocess_input))
new_model.add(model.layers[3]) #vgg16
new_model.add(model.layers[4]) #dropout
new_model.add(model.layers[5]) #dense
The problem is gone and I get consistent results from .evaluate() or .predict().
What could potentially cause the Functional model to behave like this?
EDIT
As xdurch0 pointed out, it was the dropout layer at fault for different results. The functional model applied dropout during .predict() and .evaluate() methods.

Related

Different results when using Manual KFold-Cross validation vs. KerasClassifier-KFold Cross Validation

I've been struggling to understand why two similar Kfold-cross validations result in two different averages.
When I use a manual KFold approach (with Tensorflow and Keras)
cvscores = []
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=3)
for train, test in kfold.split(X, y):
model = create_baseline()
model.fit(X[train], y[train], epochs=50, batch_size=32, verbose=0)
scores = model.evaluate(X[test], y[test], verbose=0)
#print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))
I get
65.89% (+/- 3.77%)
When I use the KerasClassifier wrapper from scikit
estimator = KerasClassifier(build_fn=create_baseline, epochs=50, batch_size=32, verbose=0)
kfold = StratifiedKFold(n_splits=10,shuffle=True, random_state=3)
results = cross_val_score(estimator, X, y, cv=kfold, scoring='accuracy')
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
I get
63.82% (5.37%)
Additionally, when using KerasClassifier the following warning appears
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/wrappers/scikit_learn.py:241: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`, if your model does multi-class classification (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`, if your model does binary classification (e.g. if it uses a `sigmoid` last-layer activation).
Do the results differ because KerasClassifier uses predict_classes() while the manual Tensorflow/Keras approach uses just predict()? If so, which approach is more reasonable?
My model looks like this
def create_baseline():
model = tf.keras.models.Sequential()
model.add(Dense(8, activation='relu', input_shape=(12,)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
The two CV-results do not look too different, they are both within each others standard deviation.
You fixed the seed for the StratifiedKFold class, that's good. However there is additional randomness you should take control of and that comes from the weight initialization. Make sure you initialize your model for each CV-run with different weights, but use the same 10 initializations for both cross-validations, manual and automatic. You can pass an initializer to each layer, they have a seed argument as well. In general you should fix all possible seeds (np.random.seed(3), tf.set_random_seed(3)).
What happens if you run cross_val_score() or your manual version twice? Do you get the same results / numbers?

Change training and testing behavior of custom Layer Keras

I am trying to change my Keras model's behavior during training and testing.
To be more precise, I want to simply evaluate the predictions during training, and add another Lambda layer (as a kind of postprocessing) for testing.
I've found a solution Here, where K.functionreceives the specified K.learning_phase() and returns an output. As I understand, using K.in_test_phase() or K.in_training_phase() will return either the first or the second parameter (as described in the docs), based on the training parameter passed.
I'm running on TF 2.0 as Backend, so eager execution is enabled by default. That being said,
passing K.learning_phase() results in an error, which has been described see here. Hence, I'm using
tensorflow.python.keras.symbolic_learning_phase(), which seems to work.
I'm currently able to get the output with K.function(), but my aim is to perform model.fit() in order to train my model, and later call model.evaluate() (I'm using the Keras' functional API).
How would the proper way to train and test my model, based on the learning flag be?
Currently my MWE is:
def build_model(images, training=None):
input_layer = Input(shape=(256,256,3), dtype="float32", batch_size=80)
...
#performing some factor disentanglement here
angle_pred = UpSampling2D(size=(8, 8), interpolation='bilinear')(angle)
radius_pred = UpSampling2D(size=(8, 8), interpolation='bilinear')(radius)
angle_radius_stack = tf.stack([angle_pred, radius_pred], 0)
hough_voting = Lambda(hough_vote)((angle_radius_stack, images))
train_test = K.in_test_phase(hough_voting, angle_radius_stack, training = training)
angle_v, radius_v = tf.unstack(train_test)
model = Model(inputs=input_layer, outputs=[angle_v, radius_v])
adam = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam, loss='mean_absolute_error', metrics=['accuracy'], run_eagerly=True)
return model
And then using training:
def train_model(model, patches, radii, angles):
fun = K.function([model.layers[0].input, B.symbolic_learning_phase()], [model.layers[-1].output])
print(fun([patches[0:80,:,:,:], True]))
model.fit(patches, [radii, angles], batch_size=80, epochs=1)

Inference / Prediction from frozen model: Obtain value after activation through the graph

I have a simple frozen tensorflow model (frozen in Keras) that I load and then try to use for prediction. I do this first in python (code below), and then using C and libtensorflow (and get the same results). The examples I have found provide the the logits (before activation) as the final output, rather than the class label after activation. Is there a way to obtain the label through the graph itself?
I understand I can operate the sigmoid / softmax operators on the logits, but that's not what I want to do. (I'm porting the code to use the libtensorflow C api, and would prefer to let the graph do the math.)
My understanding is that the session runs the graph to the operation / tensor, and stops before that operation. Is there a way to get the operation after the activation?
Keras Model:
model = Sequential()
model.add(Dense(100, activation='relu', input_shape=(21,)))
model.add(Dense(1, activation='sigmoid'))
Tensorflow code to load frozen model and predict:
from tensorflow.python.platform import gfile
with tf.Session() as sess:
with gfile.FastGFile('slopemodel/slopemodel.pb', 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
g_in = tf.import_graph_def(graph_def)
tensor_output = sess.graph.get_tensor_by_name('import/dense_2/Sigmoid:0')
tensor_input = sess.graph.get_tensor_by_name('import/dense_1_input:0')
predictions = sess.run(tensor_output, {tensor_input:sample})
print(predictions)
Truncated list of important nodes in the graph:
['import/dense_1_input',
'import/dense_1/kernel',
'import/dense_1/kernel/read',
'import/dense_1/bias',
'import/dense_1/bias/read',
'import/dense_1/MatMul',
'import/dense_1/BiasAdd',
'import/dense_1/Relu',
'import/dense_2/kernel',
'import/dense_2/kernel/read',
'import/dense_2/bias',
'import/dense_2/bias/read',
'import/dense_2/MatMul',
'import/dense_2/BiasAdd',
'import/dense_2/Sigmoid',
'import/Adam/iterations',
.
.
.]
Yes, you simply need to change the tensor_output to the one you want to obtain. Note that you will not receive the labels themselves but a one-hot vector from which you'll need to find the corresponding label yourself.

How can I use TensorFlow's sampled softmax loss function in a Keras model?

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:
import keras.backend as K
def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)
However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:
model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.
The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.
First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.
Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.
Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.
Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.
There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.
from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K
class SampledSoftmax(Layer):
def __init__(self, **kwargs):
super(SampledSoftmax, self).__init__(**kwargs)
def call(self, inputs):
"""
The first input should be the model as it were, and the second the
target (i.e., a repeat of the training data) to compute the labels
argument
"""
# the labels input to this function is batch size by 1, where the
# value at position (i, 1) is the index that is true (not zero)
# e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
biases=inputs[0]._keras_history[0].bias,
inputs=inputs[0],
labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
num_sampled=1000,
num_classes=200000)
def custom_loss(y_true, y_pred):
return K.tf.reduce_mean(y_pred)
num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))
dense = Dense(num_classes)
outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])
model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

Why did the Keras Sequential model give a different result compared to Model model?

I've tried a simple lstm model in keras to do a simple sentiment analysis using imdb dataset using both Sequential model and Model model, and turns out the latter gives a worse result. Here's my code :
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
It gives a result around 0.6 of accuracy in the first epoch, while the other code that use Model :
_input = Input(shape=[max_review_length], dtype='int32')
embedded = Embedding(
input_dim=top_words,
output_dim=embedding_size,
input_length=max_review_length,
trainable=False,
mask_zero=False
)(_input)
lstm = LSTM(100, return_sequences=True)(embedded)
probabilities = Dense(2, activation='softmax')(lstm)
model = Model(_input, probabilities)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
and it gives 0.5 accuracy as a result of the first epoch and never change afterwards.
Any reason for that, or am i doing something wrong? Thanks in advance
I see two main differences between your two models :
You have set the embeddings of the second model as "trainable=False". So you have probably a lot fewer parameters to optimize the second model compared to the first one.
The LSTM is returning the whole sequence in the second model, so the outputs shape will be different, so I don't see how you can compare the two models, they are not doing the same thing.