Change training and testing behavior of custom Layer Keras - tensorflow

I am trying to change my Keras model's behavior during training and testing.
To be more precise, I want to simply evaluate the predictions during training, and add another Lambda layer (as a kind of postprocessing) for testing.
I've found a solution Here, where K.functionreceives the specified K.learning_phase() and returns an output. As I understand, using K.in_test_phase() or K.in_training_phase() will return either the first or the second parameter (as described in the docs), based on the training parameter passed.
I'm running on TF 2.0 as Backend, so eager execution is enabled by default. That being said,
passing K.learning_phase() results in an error, which has been described see here. Hence, I'm using
tensorflow.python.keras.symbolic_learning_phase(), which seems to work.
I'm currently able to get the output with K.function(), but my aim is to perform model.fit() in order to train my model, and later call model.evaluate() (I'm using the Keras' functional API).
How would the proper way to train and test my model, based on the learning flag be?
Currently my MWE is:
def build_model(images, training=None):
input_layer = Input(shape=(256,256,3), dtype="float32", batch_size=80)
...
#performing some factor disentanglement here
angle_pred = UpSampling2D(size=(8, 8), interpolation='bilinear')(angle)
radius_pred = UpSampling2D(size=(8, 8), interpolation='bilinear')(radius)
angle_radius_stack = tf.stack([angle_pred, radius_pred], 0)
hough_voting = Lambda(hough_vote)((angle_radius_stack, images))
train_test = K.in_test_phase(hough_voting, angle_radius_stack, training = training)
angle_v, radius_v = tf.unstack(train_test)
model = Model(inputs=input_layer, outputs=[angle_v, radius_v])
adam = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam, loss='mean_absolute_error', metrics=['accuracy'], run_eagerly=True)
return model
And then using training:
def train_model(model, patches, radii, angles):
fun = K.function([model.layers[0].input, B.symbolic_learning_phase()], [model.layers[-1].output])
print(fun([patches[0:80,:,:,:], True]))
model.fit(patches, [radii, angles], batch_size=80, epochs=1)

Related

tf.keras Functional model gives different results on the same data

I have defined my Functional model like this:
base_model = VGG16(include_top=False, input_shape=(224,224,3), pooling='avg')
inputs = tf.keras.Input(shape=(224,224,3))
x = preprocess_input(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.Dropout(0.2)(x, training=True)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
The problem is when I call .evaluate() or .predict() I get slightly different results everytime when using the exact same batch (with shuffle=False in my dataset, and all the random seeds initialized).
I tried reconstructing the model without some of the layers and I found the culprit to be these 2 layers constructed by the line x=preprocess_input(inputs), which give randomness to the results:
model summary
Note: preprocess_input is a vgg16 preprocessing function at tf.keras.applications.vgg16.preprocess_input.
However, if I redefine my Functional model as Sequential:
new_model = tf.keras.Sequential()
new_model.add(model.layers[0]) #input layer
new_model.add(tf.keras.layers.Lambda(preprocess_input))
new_model.add(model.layers[3]) #vgg16
new_model.add(model.layers[4]) #dropout
new_model.add(model.layers[5]) #dense
The problem is gone and I get consistent results from .evaluate() or .predict().
What could potentially cause the Functional model to behave like this?
EDIT
As xdurch0 pointed out, it was the dropout layer at fault for different results. The functional model applied dropout during .predict() and .evaluate() methods.

Better understanding of training parameter for Keras-Model call method needed

I'd like to get a better understanding of the parameter training, when calling a Keras model.
In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):
pred = model(x, training=True)
and when you want to do inference, you should set training to false:
pred = model(x, training=False)
What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False (because I plan on freezing it like in this tutorial here):
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
What will in such a case happen, when I later on call new_model(x_new, training=True)? Will the usage of training=False for the base_model be overruled? Or will training now allways be set to True for the base_model, regardless of what I pass to the new_model? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True), that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)?
Thanks in advance!
training is a boolean argument that determines whether this call function runs in training mode or inference mode. For example, the Dropout layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.
y = Dropout(0.5)(x, training=True)
By this, we're setting training=True for the Dropout layer for training time. When we call .fit(), it set sets a flag to True and when we use evaluate or predict, in behind it sets a flag to False. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training argument is set as True or False if we want layers to operate either training mode or inference mode respectively.
# training mode
with tf.GradientTape() as tape:
logits = model(x, training=True) # forward pass
# inference mode
al_logits = model(x, training=False)
Now coming to your question. After defining the model
# Freeze the base_model
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
Now if your run this new model whether .fit() or custom training loop, the base_model will always run in inference mode as it's sets training=False.

What is the correct way to implement a 'useless loss' with Keras?

I have a Keras model that has two outputs:
output is the true output of the network on which the loss is going to be computed
additional is used to make an external task during inference (no loss should be computed with this output)
When I build the model, I write something like that:
model = Model(inputs=inp, outputs=[output, additional])
Since my Model has two outputs, I need to provide two losses when compiling the model so I created a useless loss like this:
class NoopLoss(object):
def __call__(self, y_true, y_pred, **kwargs):
return self.compute_loss(y_true, y_pred)
def compute_loss(self, y_true, y_pred):
return tf.math.square(0.0)
Which I integrate in the compile step like this:
loss = UsefulLoss() # the real loss I'm using
noop_loss = NoopLoss()
model.compile(loss=[loss, noop_loss], optimizer=optimizer, metrics=['binary_accuracy'])
It works, but I feel it is a bit hackish, is there a correct way to implement this behavior? I didn't find any official useless loss in the Keras documentation.
In my opinion, Keras was not thought to consider things like this.
I often use these hacks myself too.
But, not sure it's a better solution, actually it might not be, you can create a training model and an inference model, both sharing the trainable part:
inputs = Input(...)
trainable_out = SomeLayer(...)(inputs)
....
trainable_out = ....
extra_output = SomeLayer(...)(something)
training_model = Model(inputs, trainable_out)
inference_model = Model(inputs, [trainable_out, extra_output])
You can train training_model and automatically the other model will be trained as well.

How to evaluate the value of a tensor, from inside the model function of a custom tf.estimator

I am implementing an NLP model based on BERT, using tf.TPUEstimator(). I want to implement layer-wise training, where I need to select only one layer of the model to train for each epoch. In order to do this I wanted to change my model_fn and get the value of current_epoch.
I know how to compute the value of current_epoch as a tensor using tf.train.get_or_create_global_step() inside the model_fn BUT, I need to evaluate the value of this tensor to select which layer to train and implement return the correct train_op to the tf.estimator (train_op pertaining to a single layer chosen accrding to the value of the current_epoch).
I am unable to evaluate this tensor (current_epoch / global_step) from inside the model_fn. I tried the following but the training hangs at the step my_sess.run(my_global_step.initializer
global_step = tf.train.get_or_create_global_step()
graph = tf.get_default_graph()
my_sess = tf.Session(graph=graph)
current_epoch = (global_step * full_bs) // train_size
my_sess.run(my_global_step.initializer)
current_epoch = sess.run(current_epoch)
# My program hangs at the initialising step: my_sess.run(my_global_step.initializer)
Is there any way to evaluate a tensor using the tf.Estimators default session? How do I get the default session/ Graph?
Most importantly what is wrong in my code and why does the training hang when using tpu's and TPUEstimator?
This is not direct answer to OP's 2nd question, it is answer to the title.
I managed to print variable value with get_variable_value, but not sure if this is optimal way.
with
estimator = tf.contrib.tpu.TPUEstimator(
# ...
)
out = estimator.get_variable_value('output_bias')
print(type(out))
print(out)
I got
<class 'numpy.ndarray'>
[-0.00107745 0.00107744]

tf.Estimator.predict() issue when using a Tensorflow Hub module as the basis of a custom tf.Estimator

I am trying to create a custom tensorflow tf.Estimator. In the model_fn passed to the tf.Estimator, I am importing the Inception_V3 module from Tensorflow Hub.
Problem: After fine-tuning the model (using tf.Estimator.train), the results obtained using tf.Estimator.predict are not as good as expected based on tf.Estimator.evaluate (This is for a regression problem.)
I am new to Tensorflow and Tensorflow Hub, so I could be making lots of rookie mistakes.
When I run tf.Estimator.evaluate() on my validation data, the reported loss is in the same ball park as the loss after tf.Estimator.train() was used to train the model. The problem comes in when I try to use tf.Estimator.predict() on the same validation data.
tf.Estimator.predict() returns predictions which I then use to calculate the same loss metric (mean_squared_error) which is computed by tf.Estimator.evaluate(). I am using the same set of data to feed to the predict function as the evaluate function. But I do not get the same result for the mean_squared_error -- not remotely close! (The mse I calculate from predict is much worse.)
Here is what I have done (edited out some details)...
Define a model_fn with Tensorflow Hub module. Then call the tf.Estimator functions to train, evaluate and predict.
def my_model_fun(features, labels, mode, params):
# Load InceptionV3 Module from Tensorflow Hub
iv3_module =hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1",trainable=True, tags={'train'})
# Gather the variables for fine-tuning
var_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='CustomeLayer')
var_list.extend(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='module/InceptionV3/Mixed_5b'))
predictions = {"the_prediction" : final_output}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Define loss, optimizer, and evaluation metrics
loss = tf.losses.mean_squared_error(labels=labels, predictions=final_output)
optimizer =tf.train.AdadeltaOptimizer(learning_rate=learn_rate).minimize(loss,
var_list=var_list, global_step=tf.train.get_global_step())
rms_error = tf.metrics.root_mean_squared_error(labels=labels,predictions=predictions["the_prediction"])
eval_metric_ops = {"rms_error": rms_error}
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,train_op=optimizer)
if mode == tf.estimator.ModeKeys.EVAL:
tf.summary.scalar('rms_error', rms_error)
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,eval_metric_ops=eval_metric_ops)
iv3_estimator = tf.estimator.Estimator(model_fn=iv3_model_fn)
iv3_estimator.train(input_fn=train_input_fn, steps=TRAIN_STEPS)
iv3_estimator.evaluate(input_fn=val_input_fn)
ii =0
for ans in iv3_estimator.predict(input_fn=test_input_fn):
sqErr = np.square(label[ii] - ans['the_prediction'][0])
totalSqErr += sqErr
ii += 1
mse = totalSqErr/ii
I expect that the mse loss reported by tf.Estimator.evaluate() should be the same as the when I calculate mse from the known labels and the output of tf.Estimator.predict()
Do I need to import the Tensorflow Hub model differently when I use predict? (use trainable=False in the call to hub.Module()?
Are the weights obtained from training being used when tf.Estimator.evaluate() runs, but not when tf.Estimator.predict()- runs?
other?
There's a few things that seem to be missing from the code snippet. How is final_output computed from iv3_module? Also, mean squared error is an unusual choice of loss function for a classification problem; the common approach is to pass image features from the module into a a linear output layer with scores for each class ("logits") and a "softmax cross-entropy loss". For an explanation of these terms, you can review online tutorials like https://developers.google.com/machine-learning/crash-course/ (all the way to multi-class neural nets).
Regarding TF-Hub technicalities:
The variables of a Hub module are automatically added to the GLOBAL_VARIABLES and TRAINABLE_VARIABLES collections (if trainable=True, as you already do). No manual extension of those collections should be needed.
hub.Module(..., tags=...) should be set to {"train"} for mode==TRAIN and set to None or the empty set otherwise.
In general, it's useful to get a solution working end-to-end for your problem without fine-tuning as a baseline, and then add fine-tuning.