how to perform early stopping when writing our own custom training loops in tensorflow 2.0? - tensorflow

To perform early stopping in Tensorflow, tf.keras has a very convenient method which is a call tf.keras.callbacks, which in turn can be used in model.fit() to execute it. When we write Custom training loop, I couldn't understand how to make use of the tf.keras.callbacks to execute it. Can someone provide with a basic tutorial on how to do it?
https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/

You have 2 approaches to create custom training loops.
One is this common 2 nested for loops.
or you can do this. All the callbacks and other features are available here
Tip : THE CODE BELLOW IS JUST AN SLICE OF CODE AND MODEL STRUCTURE IS NOT IMPLEMENTED. You should do it by your own.
More info? check here
class CustomModel(keras.Model):
def train_step(self, data):
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
print(data)
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = self.compiled_loss(y, y_pred,
regularization_losses=self.losses)
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(y, y_pred)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
# Construct and compile an instance of CustomModel
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['...'])
earlystopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
# Just use `fit` as usual
model.fit(train_ds, epochs=3, callbacks=[earlystopping_cb])
more info: https://keras.io/getting_started/intro_to_keras_for_engineers/#using-fit-with-a-custom-training-step

Related

Can I use the output of tf.keras.utils.image_dataset_from_directory to train an autoencoder?

To put it simply, I'd like to be able to use a keras dataset created from a local image directory to train an autoencoder. To clarify, this is a model that approximates the Identity function for images : ideally, the output is exactly equal to the input.
The dataset is too large to fit in memory, so converting the dataset to a numpy array with np.concatenate will not help me here.
Or in other words, I'd like an Identity image dataset, where the label for each image in the dataset is exactly equal to the image itself.
Here's my (non-working) sample code:
train_ds, validate_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
labels=None,
validation_split=0.1,
subset="both",
shuffle=True,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size,
crop_to_aspect_ratio=True)
history = autoencoder.fit(
x=train_ds,
y=train_ds,
validation_data=(validate_ds, validate_ds),
epochs=epochs,
batch_size=16
)
The image_dataset_from_directory function gives me a dataset of images with no labels. So far so good.
The second command fails with the error message:
ValueError: `y` argument is not supported when using dataset as input.
On the other hand, if I exclude the y variable I get this error:
ValueError: Target data is missing. Your model was compiled with loss=binary_crossentropy, and therefore expects target data to be provided in `fit()`.
Which is not at all surprising, because there are NO labels, as I requested none. But yet it won't let me use the dataset as the labels which is what I need to do.
Any help would be appreciated.
While there are ways to modify the dataset, I think the best option is to write a custom model class. This is modified from the official tutorial:
class Autoencoder(tf.keras.Model):
def train_step(self, data):
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
x = data # CHANGE 1: changed from x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = self.compiled_loss(x, y_pred, regularization_losses=self.losses) # CHANGE 2: replaced y by x as label
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(x, y_pred) # CHANGE 3: like change 2
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
def test_step(self, data):
# CHANGED in the same way
x = data
# Compute predictions
y_pred = self(x, training=False)
# Updates the metrics tracking the loss
self.compiled_loss(x, y_pred, regularization_losses=self.losses)
# Update the metrics.
self.compiled_metrics.update_state(x, y_pred)
# Return a dict mapping metric names to current value.
# Note that it will include the loss (tracked in self.metrics).
return {m.name: m.result() for m in self.metrics}
This is for the functional API (tf.keras.Model). In case you are using a Sequential model, you should inherit from that instead. You can use this as a direct replacement for the normal model constructor.
Another option could be to use train_zipped = tf.data.Dataset.zip((train_ds, train_ds)) to create an input, target dataset that you can put directly into the usual model and loss function. Personally, I don't like the duplication. Also, I'm not sure if this will behave correctly for the shuffled data (will both copies of train_ds be shuffled in the same way?).
You could circumvent this by setting shuffle=False in image_dataset_from_directory, and then use train_zipped = train_zipped.shuffle(buffer_size) instead. However, in my experience this is very slow.

How to apply a function to network output before passing it to the loss?

I'm trying to implement a network in tensorflow and I need to apply a function f to the network output and use the returned value as the prediction to be used in the loss.
Is there a simple way to make it or which part of tensorflow should I study to achieve that ?
you should study how to write custom training loops in tensorflow: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
A simplified and short version could look similar to the code bellow:
#Repeat for several epochs
for epoch in range(epochs):
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Start tracing your forward pass to calculate gradients
with tf.GradientTape() as tape:
prediction = model(x_batch_train, training=True)
# HERE YOU PLACE YOUR FUNCTION f
transformed_prediction = f(prediction)
loss_value = loss_fn(y_batch_train, transformed_prediction )
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
(...)

Using training weights on a non-training data to design a new loss function

I would like to access the training point(s) at a training iteration and incorporate a soft constraint into my loss function by using data points not included in the training set. I will use this post as a reference.
import numpy as np
import keras.backend as K
from keras.layers import Dense, Input
from keras.models import Model
# Some random training data and labels
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
# Simple neural net with three outputs
input_layer = Input((20,))
hidden_layer = Dense(16)(input_layer)
output_layer = Dense(3)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#each training point has another data pair. In the real example, I will have multiple
#supporters. That is why I am using dict.
holder = np.random.rand(100, 5)
iter = np.arange(start=1, stop=features.shape[0], step=1)
supporters = {}
for i,j in zip(iter, holder): #i represent the ith training data
supporters[i]=j
# Write a custom loss function
def custom_loss(y_true, y_pred):
# Normal MSE loss
mse = K.mean(K.square(y_true-y_pred), axis=-1)
new_constraint = ....
return(mse+new_constraint)
model.compile(loss=custom_loss, optimizer='sgd')
model.fit(features, labels, epochs=1, ,batch_size=1=1)
For simplicity, let us assume that I'd like to minimize the minimum absolute value difference between the prediction value and the prediction of the pair data stored in supporters by using the fixed network weights. Also, assume that I pass one training point at each batch. However, I could not figure out how to perform this opeartion. I've tried something shown below, but clearly, it is not correct.
new_constraint = K.sum(y_pred - model.fit(supporters))
Fit is the procedure of training evaluating the model. I think that it would be better for your problem to load a new instance of your model with your current weights and evaluate the batch loss in order to calculate the loss of the main model.
main_model = Model() # This is your main training model
def custom_loss_1(y_true, y_pred): # Avoid recursive calls
mse = K.mean(K.square(y_true-y_pred), axis=-1)
return mse
def custom_loss(y_true, y_pred):
support_model = tf.keras.models.clone_model(main_model) # You copy the main model but the weights are uninitialized
support_model.build((20,)) # You build with inputs same as your support data
support_model.compile(loss=custom_loss_1, optimizer='sgd')
support_model.set_weights(main_model.get_weights()) # You load the weight of the main model
mse = custom_loss_1(y_true, y_pred)
# You just want to evaluate the model, not to train. If you have more
# metrics than just loss the use support_model.evaluate(supporters)[0]
new_constraint = K.sum(y_pred - support_model.predict(supporters)) # predict to get the output, evaluate to get the metrics
return(mse+new_constraint)

tensorflow, compute gradients with respect to weights that come from two models (encoder, decoder)

I have a encoder model and a decoder model (RNN).
I want to compute the gradients and update the weights.
I'm somewhat confused by what I've seen so far on the web.
Which block is the best practice? Is there any difference between the two options? Gradients seems to converge faster in Block 1, I do not know why?
# BLOCK 1, in two operations
encoder_gradients,decoder_gradients = tape.gradient(loss,[encoder_model.trainable_variables,decoder_model.trainable_variables])
myoptimizer.apply_gradients(zip(encoder_gradients,encoder_model.trainable_variables))
myoptimizer.apply_gradients(zip(decoder_gradients,decoder_model.trainable_variables))
# BLOCK 2, in one operation
gradients = tape.gradient(loss,encoder_model.trainable_variables + decoder_model.trainable_variables)
myoptimizer.apply_gradients(zip(gradients,encoder_model.trainable_variables +
decoder_model.trainable_variables))
You can manually verify this.
First, let's simplify the model. Let the encoder and decoder both be a single dense layer. This is mostly for simplicity and you can print out the weights being applying the gradients, gradients and weights after applying the gradients.
import tensorflow as tf
import numpy as np
from copy import deepcopy
# create a simple model with one encoder and one decoder layer.
class custom_net(tf.keras.Model):
def __init__(self):
super().__init__()
self.encoder = tf.keras.layers.Dense(3, activation='relu')
self.decoder = tf.keras.layers.Dense(3, activation='relu')
def call(self, inp):
return self.decoder(self.encoder(inp))
net = model()
# create dummy input/output
inp = np.random.randn(1,1)
gt = np.random.randn(3,1)
# set persistent to true since we will be accessing the gradient 2 times
with tf.GradientTape(persistent=True) as tape:
out = custom_model(inp)
loss = tf.keras.losses.mean_squared_error(gt, out)
# get the gradients as mentioned in the question
enc_grad, dec_grad = tape.gradient(loss,
[net.encoder.trainable_variables,
net.decoder.trainable_variables])
gradients = tape.gradient(loss,
net.encoder.trainable_variables + net.decoder.trainable_variables)
First, let's use a stateless optimizer like SGD which updates the weights based on the following formula and compare it to the 2 approaches mentioned in the question.
new_weights = weights - learning_rate * gradients.
# Block 1
myoptimizer = tf.keras.optimizers.SGD(learning_rate=1)
# store weights before updating the weights based on the gradients
old_enc_weights = deepcopy(net.encoder.get_weights())
old_dec_weights = deepcopy(net.decoder.get_weights())
myoptimizer.apply_gradients(zip(enc_grad, net.encoder.trainable_variables))
myoptimizer.apply_gradients(zip(dec_grad, net.decoder.trainable_variables))
# manually calculate the weights after gradient update
# since the learning rate is 1, new_weights = weights - grad
cal_enc_weights = []
for weights, grad in zip(old_enc_weights, enc_grad):
cal_enc_weights.append(weights-grad)
cal_dec_weights = []
for weights, grad in zip(old_dec_weights, dec_grad):
cal_dec_weights.append(weights-grad)
for weights, man_calc_weight in zip(net.encoder.get_weights(), cal_enc_weights):
print(np.linalg.norm(weights-man_calc_weight))
for weights, man_calc_weight in zip(net.decoder.get_weights(), cal_dec_weights):
print(np.linalg.norm(weights-man_calc_weight))
# block 2
old_weights = deepcopy(net.encoder.trainable_variables + net.decoder.trainable_variables)
myoptimizer.apply_gradients(zip(gradients, net.encoder.trainable_variables + \
net.decoder.trainable_variables))
cal_weights = []
for weight, grad in zip(old_weights, gradients):
cal_weights.append(weight-grad)
for weight, man_calc_weight in zip(net.encoder.trainable_variables + net.decoder.trainable_variables, cal_weights):
print(np.linalg.norm(weight-man_calc_weight))
You will see that both the methods update the weights in the exact same way.
I think you used an optimizer like Adam/RMSProp which is stateful. For such optimizers invoking apply_gradients will update the optimizer parameters based on the gradient value and sign. In the first case, the optimizer parameters are updated twice and in the second case only once.
I would stick to the second option if I were you, since you are performing just one step of optimization here.

Tensorflow 2.0 Custom loss function with multiple inputs

I am trying to optimize a model with the following two loss functions
def loss_1(pred, weights, logits):
weighted_sparse_ce = kls.SparseCategoricalCrossentropy(from_logits=True)
policy_loss = weighted_sparse_ce(pred, logits, sample_weight=advantages)
and
def loss_2(y_pred, y):
return kls.mean_squared_error(y_pred, y)
however, because TensorFlow 2 expects loss function to be of the form
def fn(y_pred, y_true):
...
I am using a work-around for loss_1 where I pack pred and weights into a single tensor before passing to loss_1 in the call to model.fit and then unpack them in loss_1. This is inelegant and nasty because pred and weights are of different data types and so this requires an additional cast, pack, un-pack and un-cast each time I call model.fit.
Furthermore, I am aware of the sample_weight argument to fit, which is kind of like the solution to this question. This might be a workable solution were it not for the fact that I am using two loss functions and I only want the sample_weight applied to one of them. Also, even if this were a solution, would it not be generalizable to other types of custom loss functions.
All that being said, my question, said concisely, is:
What is the best way to create a loss function with an arbitrary number of
arguments in TensorFlow 2?
Another thing I have tried is passing a tf.tuple but that also seems to violate TensorFlow's desires for a loss function input.
This problem can be easily solved using custom training in TF2. You need only compute your two-component loss function within a GradientTape context and then call an optimizer with the produced gradients. For example, you could create a function custom_loss which computes both losses given the arguments to each:
def custom_loss(model, loss1_args, loss2_args):
# model: tf.model.Keras
# loss1_args: arguments to loss_1, as tuple.
# loss2_args: arguments to loss_2, as tuple.
with tf.GradientTape() as tape:
l1_value = loss_1(*loss1_args)
l2_value = loss_2(*loss2_args)
loss_value = [l1_value, l2_value]
return loss_value, tape.gradient(loss_value, model.trainable_variables)
# In training loop:
loss_values, grads = custom_loss(model, loss1_args, loss2_args)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
In this way, each loss function can take an arbitrary number of eager tensors, regardless of whether they are inputs or outputs to the model. The sets of arguments to each loss function need not be disjoint as shown in this example.
To expand on Jon's answer. In case you want to still have the benefits of a Keras Model you can expand the model class and write your own custom train_step:
from tensorflow.python.keras.engine import data_adapter
# custom loss function that takes two outputs of the model
# as input parameters which would otherwise not be possible
def custom_loss(gt, x, y):
return tf.reduce_mean(x) + tf.reduce_mean(y)
class CustomModel(keras.Model):
def compile(self, optimizer, my_loss):
super().compile(optimizer)
self.my_loss = my_loss
def train_step(self, data):
data = data_adapter.expand_1d(data)
input_data, gt, sample_weight = data_adapter.unpack_x_y_sample_weight(data)
with tf.GradientTape() as tape:
y_pred = self(input_data, training=True)
loss_value = self.my_loss(gt, y_pred[0], y_pred[1])
grads = tape.gradient(loss_value, self.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
return {"loss_value": loss_value}
...
model = CustomModel(inputs=input_tensor0, outputs=[x, y])
model.compile(optimizer=tf.keras.optimizers.Adam(), my_loss=custom_loss)
In tf 1.x we have tf.nn.weighted_cross_entropy_with_logits function which allows us trade off recall and precision by adding extra positive weights for each class. In multi-label classification, it should be a (N,) tensor or numpy array. However, in tf 2.0, I haven't found similar loss functions yet, so I wrote my own loss function with extra arguments pos_w_arr.
from tensorflow.keras.backend import epsilon
def pos_w_loss(pos_w_arr):
"""
Define positive weighted loss function
"""
def fn(y_true, y_pred):
_epsilon = tf.convert_to_tensor(epsilon(), dtype=y_pred.dtype.base_dtype)
_y_pred = tf.clip_by_value(y_pred, _epsilon, 1. - _epsilon)
cost = tf.multiply(tf.multiply(y_true, tf.math.log(
_y_pred)), pos_w_arr)+tf.multiply((1-y_true), tf.math.log(1-_y_pred))
return -tf.reduce_mean(cost)
return fn
Not sure what do you mean it wouldn't work when using eager tensors or numpy array as inputs though. Please correct me if I'm wrong.