Regression custom loss return value in Keras with and without custom loop - tensorflow

When a custom loss is defined in a Keras model, online sources seem to indicate that the the loss should return an array of values (a loss for each sample in the batch). Something like this
def custom_loss_function(y_true, y_pred):
squared_difference = tf.square(y_true - y_pred)
return tf.reduce_mean(squared_difference, axis=-1)
model.compile(optimizer='adam', loss=custom_loss_function)
In the example above, I have no idea when or if the model is taking the batch sum or mean with tf.reduce_sum() or tf.reduce_mean()
In another situation when we want to implement a custom training loop with a custom function, the template to follow according to Keras documentation is this
for epoch in range(epochs):
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
y_batch_pred = model(x_batch_train, training=True)
loss_value = custom_loss_function(y_batch_train, y_batch_pred)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
So by the book, if I understand correctly, we are supposed to take the mean of the batch gradients. Therefore, the loss value above should be a single value per batch.
However, the example will work with both of the following variations:
tf.reduce_mean(squared_difference, axis=-1) # array of loss for each sample
tf.reduce_mean(squared_difference) # mean loss for batch
So, why does the first option (array loss) above still work? Is apply_gradients applying small changes for each value sequentially? Is this wrong although it works?
What is the correct way without a custom loop, and with a custom loop?

Good question. In my opinion, this is not well documented in the TensorFlow/Keras API. By default, if you do not provide a scalar loss_value, TensorFlow will add them up (and the updates are not sequential). Essentially, this is equivalent to summing the losses along the batch axis.
Currently, the losses in the TensorFlow API include a reduction argument (for example, tf.losses.MeanSquaredError) that allows specifying how to aggregate the loss along the batch axis.

Related

Implementing custom loss that makes use of training batch data with Keras

I am trying to implement a GAN called the SimGAN proposed by Apple researchers. The SimGAN is used to refine labelled synthetic images so that they look more like the unlabelled real images.
The link to the paper can be found on arXiv here.
In the paper, the loss function of the combined model, which comprises the generator and the discriminator, has a self-regularization component in the form of an L1 loss that penalizes too great a difference between the synthetic images and the images after refinement. In other words, the refinement should not be too drastic.
I would like to know how I can implement this self-regularization loss in Keras. Here is what I tried:
def self_regularization_loss(refined_images, syn_images):
def l1loss(y_true, y_pred):
return keras.metrics.mean_absolute_error(refined_images, syn_images)
return l1loss
However, I do not think I can compile the model in the way below as the batches of refined and synthetic images change during training time.
model.compile(loss=[self_regularization_loss(current_batch_of_refined, current_batch_of_synthetic),
local_adversarial_loss],
optimizer=opt)
What is the way to implement this loss?
Trying using the tf.function decorator and tf.GradientTape():
#tf.function
def train_step(model, batch):
with tf.GradientTape() as tape:
refined_images, syn_images = batch
loss = self_regularization_loss(model, refined_images, syn_images)
gradients = tape.gradient(loss, model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
your training loop can look something like:
for image_batch in dataset:
train_step(model, image_batch)
Here it is assumed that model is of type tf.keras.Model. More details to the model class can be found here. Note that model is also passed to self_regularization_loss. In this function your model recieves both images as inputs and then gives you the respective output. Then you calculate your loss.

Training RNN with error evaluation at every time step

I have a simpleRNN / LSTM that I'm trying to train on a sequential classification task using tensorflow. There is a sequence of data (300 time steps) that predicts a label at t=300. For my task I would like for the RNN to evaluate the error at every timestep (not just at the final time point) and propagate it backwards (as figure below).
After some responses below it seems I need to do a few things: use return_sequences flag; use the TimeDistributed layer to access the output from the LSTM/RNN; and also defined a custom loss function.
model = Sequential()
layer1 = LSTM(n_neurons, input_shape=(length, 1), return_sequences=True)
model.add(layer1)
layer2 = TimeDistributed(Dense(1))
model.add(layer2)
# Define custom loss
def custom_loss(layer1):
# Create a loss function
def loss(y_true,y_pred):
# access layer1 at every time point and compute mean error
# UNCLEAR HOW TO RUN AT EVERY TIME STEP
err = K.mean(layer1(X) - y_true, axis=-1)
return err
# Return a function
return loss
# Compile the model
model.compile(optimizer='adam', loss=custom_loss(layer), metrics=['accuracy'])
For now I'm a bit confused of the custom_loss function as it's not clear that how I can pass in layer1 and compute the error inside the inner most loss function.
Anyone has a suggestion or can point me to a more detailed answer?
The question is not easy to answer since it is not clear what you're trying to achieve (it shouldn't be the same using a FFNN or a RNN, and what works best depends definitely on the application).
Anyway, you might be confusing the training steps (say, the forward- and back- propagation over a minibatch of sequences) with the "internal" steps of the RNN. A single sequence (or a single minibatch) will always "unroll" entirely through time during the forward pass before any output is made available: only after (thus, at the end of the training step), you can use the predictions and compute the losses to backpropagate.
What you can do is return sequences of outputs (one y_predicted for every internal time step) including the argument return_sequences=True inside SimpleRNN(...). This will give you a sequence of 300 predictions, each of which depends only on the past inputs with respect to the considered internal time step. You can then use the outputs that you need to compute the loss, possibly in a custom loss function.
I hope I've been clear enough. Otherwise, let me know if I can help further.

How to make use of class_weights to calculated custom loss fuction while using custom training loop (i.e. not using .fit )

I have written my custom training loop using tf.GradientTape(). My data has 2 classes. The classes are not balanced; class1 data contributes almost 80% and class2 contributes remaining 20%. Therefore in order to remove this imbalance I was trying to write custom loss function which will take into account this imbalance and apply the corresponding class weights and calculate the loss. i.e. I want to use the class_weights = [0.2, 0.8]. I am not able to find similar examples.
However all the examples I am seeing are using model.fit approach where its easier to pass the class_weights. I am not able to find out the example which uses class_weights with custom training loop using tf.GradientTape.
I did go through the suggestions of using sample_weight, however I don't have the data where in I can specify the weights for samples, therefore my preference is to use class weight.
I am using BinaryCrossentropy loss as loss function but I want to change the loss based on the class_weights. That's where I am stuck, how to tell BinaryCrossentropy to consider the class_weights.
Is my approach of using custom loss function correct or there is better way to make use of class_weights while training with custom training loop (not using model.fit)?
you can write your own loss function. in that loss function call BinaryCrossentropy and then multiply the result in the weight you want and return that
Here's an implementation that should work for n classes instead of just 2.
For your example of 80:20 split, calculate weights as below (assuming 100 samples in total).
Weight calculation (ref: Handling Class Imbalance: TensorFlow):
weight_class_0 = (1/count_for_class_0) * (total_samples / num_classes) # (80%) 0.625
weight_class_1 = (1/count_for_class_1) * (total_samples / num_classes) # (20%) 2.5
class_wts = tf.constant([weight_class_0, weight_class_1])
Loss function: Requires labels to be sparse and logits unscaled (no activations applied).
# Example logits=[[-3.2, 2.0], [1.2, 0.5], ...], (sparse)labels=[0, 1, ...]
def weighted_sparse_categorical_crossentropy(labels, logits, weights):
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels, logits)
class_weights = tf.gather(weights, labels)
return tf.reduce_mean(class_weights * loss)
You can supply this loss function to custom training loops.

can't reproduce model.fit with GradientTape

I've been trying to investigate into the reason (e.g. by checking weights, gradients and activations during training) why SGD with a 0.001 learning rate worked in training while Adam fails to do so. (Please see my previous post [here](Why is my loss (binary cross entropy) converging on ~0.6? (Task: Natural Language Inference)"Why is my loss (binary cross entropy) converging on ~0.6? (Task: Natural Language Inference)"))
Note: I'm using the same model from my previous post here as well.
using tf.keras, i trained the neural network using model.fit():
model.compile(optimizer=SGD(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x=ds,
epoch=80,
validation_data=ds_val)
This resulted in a epoch loss graphed below, within the 1st epoch, it's reached a train loss of 0.46 and then ultimately resulting in a train_loss of 0.1241 and val_loss of 0.2849.
I would've used tf.keras.callbacks.Tensorboard(histogram_freq=1) to train the network with both SGD(0.001) and Adam to investigate but it's throwing an InvalidArgumentError on Variable:0, something I can't decipher. So I tried to write a custom training loop using GradientTape and plotting the values.
using tf.GradientTape(), i tried to reproduce the results using the exact same model and dataset, however the epoch loss is training incredibly slowly, reaching train loss of 0.676 after 15 epochs (see graph below), is there something wrong with my implementation? (code below)
#tf.function
def compute_grads(train_batch: Dict[str,tf.Tensor], target_batch: tf.Tensor,
loss_fn: Loss, model: tf.keras.Model):
with tf.GradientTape(persistent=False) as tape:
# forward pass
outputs = model(train_batch)
# calculate loss
loss = loss_fn(y_true=target_batch, y_pred=outputs)
# calculate gradients for each param
grads = tape.gradient(loss, model.trainable_variables)
return grads, loss
BATCH_SIZE = 8
EPOCHS = 15
bce = BinaryCrossentropy()
optimizer = SGD(learning_rate=0.001)
for epoch in tqdm(range(EPOCHS), desc='epoch'):
# - accumulators
epoch_loss = 0.0
for (i, (train_batch, target_dict)) in tqdm(enumerate(ds_train.shuffle(1024).batch(BATCH_SIZE)), desc='step'):
(grads, loss) = compute_grads(train_batch, target_dict['target'], bce, model)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
epoch_loss += loss
avg_epoch_loss = epoch_loss/(i+1)
tensorboard_scalar(writer, name='epoch_loss', data=avg_epoch_loss, step=epoch) # custom helper function
print("Epoch {}: epoch_loss = {}".format(epoch, avg_epoch_loss))
Thanks in advance!
Check if you have shuffle your dataset then the problem may came from the shuffling using the tf.Dataset method. It only shuffled through the dataset one bucket at the time. Using the Keras.Model.fit yielded better results because it probably adds another shuffling.
By adding a shuffling with numpy.random.shuffle it may improve the training performance. From this reference.
The example of applying it into generation of the dataset is:
numpy_data = np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1), index_data.reshape(-1, 1)])
np.random.shuffle(numpy_data)
indexes = np.array(numpy_data[:, :2], dtype=np.uint32)
labels = np.array(numpy_data[:, 2].reshape(-1, 1), dtype=np.float32)
train_ds = data.Dataset.from_tensor_slices(
(indexes, labels)
).shuffle(100000).batch(batch_size, drop_remainder=True)
If this not work you may need to use Dataset .repeat(epochs_number) and .shuffle(..., reshuffle_each_iteration=True):
train_ds = data.Dataset.from_tensor_slices(
(np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1)]), index_data)
).shuffle(100000, reshuffle_each_iteration=True
).batch(batch_size, drop_remainder=True
).repeat(epochs_number)
for ix, (examples, labels) in train_ds.enumerate():
train_step(examples, labels)
current_epoch = ix // (len(index_data) // batch_size)
This workaround is not beautiful nor natural, for the moment you can use this to shuffle each epoch. It's a known issue and will be fixed, in the future you can use for epoch in range(epochs_number) instead of .repeat()
The solution provided here may also help a lot. You might want to check it out.
If this is not the case, you may want to speed up the TF2.0 GradientTape. This can be the solution:
TensorFlow 2.0 introduces the concept of functions, which translate eager code into graph code.
The usage is pretty straight-forward. The only change needed is that all relevant functions (like compute_loss and apply_gradients) have to be annotated with #tf.function.

Best practices in Tensorflow 2.0(Training step)

In tensorflow 2.0 you don't have to worry about training phase(batch size, number of epochs etc), because everything can be defined in compile method: model.fit(X_train,Y_train,batch_size = 64,epochs = 100).
But I have seen the following code style:
optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
#tf.function
def train_step(inputs, labels):
with tf.GradientTape() as tape:
predictions = model(inputs, training=True)
regularization_loss = tf.math.add_n(model.losses)
pred_loss = loss_fn(labels, predictions)
total_loss = pred_loss + regularization_loss
gradients = tape.gradient(total_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
for epoch in range(NUM_EPOCHS):
for inputs, labels in train_data:
train_step(inputs, labels)
print("Finished epoch", epoch)
So here you can observe "more detailed" code, where you manually define by for loops you training procedure.
I have following question: what is the best practice in Tensorflow 2.0? I haven't found a any complete tutorial.
Use what is best for your needs.
Both methods are documented in Tensorflow tutorials.
If you don't need anything special, no extra losses, strange metrics or intricate gradient computation, just use a model.fit() or a model.fit_generator(). This is totally ok and makes your life easier.
A custom training loop might come in handy when you have complicated models with non-trivial loss/gradients calculation.
Up to now, two applications I tried were easier with this:
Training a GAN's generator and discriminator simultaneously without having to do the generation step twice. (It's complicated because you have a loss function that applies to different y_true values, and each case should update only a part of the model) - The other option would require to have a few separate models, each model with its own trainable=True/False configuration, and train then in separate phases.
Training inputs (good for style transfer models) -- Alternatively, create a custom layer that takes dummy inputs and that outputs its own trainable weights. But it gets complicated to compile several loss functions for each of the outputs of the base and style networks.