change loss function during training - tensorflow

Suppose my loss function is of the following form:
loss = a*loss_1 + (1-a)*loss_2
Suppose also I am training for 100 steps. How can I dynamically change the loss function in tensorflow so that I gradually change "a" from 1 to 0 during the 100 steps of training?
To be precise, I want my loss to be
loss = 1*loss_1+0*loss_2 = loss_1
at the beginning of training (at step 1)
and
loss = 0*loss_1+1*loss_2 = loss_2 at the end (step 100)
with some kind of gradual (doesn't have to be continuous) decrease in between.

Assuming that the value of a does not depend on the computation done at the current step, create a placeholder for a then pass the value you want using the feed dictionary.

You can use tf.train.polynomial_decay.
tf.train.polynomial_decay(learning_rate=1, global_step=step_from_placeholder,
decay_steps=100, end_learning_rate=0,
power=1.0, cycle=False, name=None)
This computes
global_step = min(global_step, decay_steps)
decayed_learning_rate = (learning_rate - end_learning_rate) * \
(1 - global_step / decay_steps) ** (power) + end_learning_rate

Related

What Loss function to use for binary classification in CNN using float labels?

So I am building a CNN that gets images using labels that go from 0 to 1.
What I mean is that I am trying to perform detection of one thing in the image and each image has a label between 0 and 1 that stands for the probability of said type of event being in that image.
I want to output this probability so I am using a sigmoid activation function in the output layer but I am having trouble in deciding what loss function makes sense in this situation. If my labels were 0 and 1s I would use Binary CrossEntropy but does that still make sense when my labels are floats ranging from 0 to 1?
Cheers.
This solution is for logits (output of last linear layer) not for output probabilities
def loss(logits, soft_labels):
anti_soft_labels = 1 - soft_labels
return soft_labels * tf.nn.softplus(-logits)
+ anti_soft_labels * tf.nn.softplus(logits) + tf.math.xlogy(soft_labels, soft_labels) + tf.math.xlogy(anti_soft_labels, anti_soft_labels)
loss(logits=tf.constant([10., 0, -10]), soft_labels=tf.constant([1., 0.5, 0.]))
# [4.53989e-05, 0.00000e+00, 4.53989e-05]
If you need to have 0 as minimal loss value for any soft label use
def loss(logits, soft_labels):
anti_soft_labels = 1 - soft_labels
return soft_labels * tf.nn.softplus(-logits) \
+ anti_soft_labels * tf.nn.softplus(logits) \
+ tf.math.xlogy(soft_labels, soft_labels) \
+ tf.math.xlogy(anti_soft_labels, anti_soft_labels)
loss(logits=tf.constant([10., 0, -10]), soft_labels=tf.constant([1., 0.5, 0.]))
# [4.53989e-05, 0.00000e+00, 4.53989e-05]```

How does y_pred look like when making a custom loss function in keras?

I am training a UNet shaped CNN and have to deal with data imbalances. I want to minimise false negatives, so I want to implement a custom loss function that does so. I created the following loss function:
from tensorflow.keras import backend as K
def fbeta_loss(y_true, y_pred, beta=2., epsilon=K.epsilon()):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
tp = K.sum(y_true_f * y_pred_f)
predicted_positive = K.sum(y_pred_f)
actual_positive = K.sum(y_true_f)
precision = tp/(predicted_positive+epsilon) # calculating precision
recall = tp/(actual_positive+epsilon) # calculating recall
# calculating fbeta
beta_squared = K.square(beta)
fb = (1+beta_squared)*precision*recall / (beta_squared*precision + recall + epsilon)
return 1-fb
However, I am not sure if y_pred is binary, or a float number between 0 and 1. In my final layer I use a sigmoid activation. Does that mean if I create a custom loss function y_pred is a float between 0 and 1, and I should add a step that maps every value higher then a threshold(0.5) to 1 and lower to 0? Or is that step already included in the Keras model? Since in similar custom loss implementations that step is often not included, e.g. .
Hopefully this is sort of clear, I am relatively new to stackoverflow. Let me know if anything is missing! Thanks in advance.
The output of sigmoid activation function is always between 0 and 1.
In the limit of x tending towards infinity, S(x) converges to 1, and in the limit of x tending towards negative infinity, S(x) converges to 0. Here, the word converges does not mean that S(x) reach any of 0 or 1 but it converges to 0 and 1.
And so the output of S(x) is always a float between 0 and 1.
Range of S(x):
0 < S(x) < 1

Changing the learning rate after every step in Keras

I want to increase the learning rate from batch to batch inside of one epoch, so the first data the Net sees in one epoch has low learning rate and the last data it sees has high learning rate. How do I do this in tf.keras?
To modify the learning rate after every epoch, you can use tf.keras.callbacks.LearningRateScheduler as mentioned in the docs here.
But in our case, we need to modify the learning rate after every batch is passed to the model. We'll use tf.keras.optimizers.schedules.LearningRateSchedule for this purpose. This would modify the learning rate after each step or a gradient update.
Suppose I have 100 samples in my training dataset and my batch size is 5. The no. of steps will be 100 / 5 = 20 steps. Reframing the statement, in a single epoch, 20 batches will be passed to the model and gradient updates would also occur 20 times ( in a single epoch ).
Using the code given in the docs,
batch_size = 5
num_train_samples = 100
class MyLRSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, initial_learning_rate):
self.initial_learning_rate = initial_learning_rate
def __call__(self, step):
return self.initial_learning_rate / (step + 1)
optimizer = tf.keras.optimizers.SGD(learning_rate=MyLRSchedule(0.1))
The value of step will be go from 0 to 19 for the 1st epoch, considering our example. For the 2nd epoch, it will go from 20 to 39. For your use-case, we can modify the above like,
batch_size = 5
num_train_samples = 100
num_steps = num_train_samples / batch_size
class MyLRSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(self, initial_learning_rate):
self.initial_learning_rate = initial_learning_rate
def __call__(self, step):
step_in_epoch = step - (( step // num_steps ) * num_steps )
# Update LR according to step_in_epoch
optimizer = tf.keras.optimizers.SGD(learning_rate=MyLRSchedule(0.1))
The value of step_in_epoch will go from 0 to 19 for 1st epoch. For 2nd epoch, it will go from 0 to 19 again and likewise for all epochs. Update the LR accordingly.
Make sure that num_train_samples is perfectly divisible by the batch size. This would ease the calculation of no. of steps.

Creating custom metrics in tensorflow estimators

I am training a classification problem using tensorflow estimators.
I want to calculate the f1 score for each batch to data along with precision and recall.
I calculate precision and recall using the code below and log them for evaluation and training.
I also calculate the fscore using the formula, but while logging the fscore I get an error.
pre = tf.metrics.precision(labels=labels,predictions=pred,name="precision")
rec = tf.metrics.recall(labels=labels,predictions=pred,name="recall")
fscore_val = tf.reduce_mean((2*pre[0]*rec[0]) / (pre[0] + rec[0] + 1e-5))
fscore_update = tf.group(pre[1], rec[1])
fscore = (fscore_val, fscore_update)
# logging metric at evaluation time
metrics['precision'] = pre
metrics['recall'] = rec
metrics['fscore'] = fscore
# logging metric at training time
tf.summary.scalar('precision', pre[1])
tf.summary.scalar('recall', rec[1])
tf.summary.scalar('fscore', fscore)
This is the error that I get.
TypeError: Expected float32, got <tf.Operation 'metrics_Left_Lane_Type/group_deps' type=NoOp> of type 'Operation' instead.
I understand why I am getting this error.
It is because fscore should be two values, similar to precision and recall.
Can someone please help me on how to do this in tensorflow estimators?
First of all, TensorFlow has it's own f1 score tf.contrib.metrics.f1_score and it is rather straightforward to use. The only possible downside is that it hides threshold value from user, choosing the best from specified quantity of possible thresholds.
predictions = tf.sigmoid(logits)
tf.contrib.metrics.f1_score(labels, predictions, num_thresholds=20)
If, for any reason, you want a custom implementation, you need to group update_ops. Every TensorFlow metric has operation that increments its value. You can set threshold manually when defining predictions
predictions = tf.greater(tf.sigmoid(logits), 0.5)
def f1_score(labels, predictions):
precision, update_op_precision = tf.metrics.precision(labels, predictions)
recall, update_op_recall = tf.metrics.recall(labels, predictions)
eps = 1e-5 #small constant for numerical stability
f1 = 2 * precision * recall / (precision + recall + eps)
f1_upd = 2 * update_op_precision * update_op_recall / (update_op_precision + update_op_recall + eps)
return f1, f1_upd
f1_score = f1_score(labels, predictions)
Then you can add it to eval_metric_ops dict or pass to summary.scalar
eval_metric_ops = {'f1': f1_score}
tf.summary.scalar('f1', f1_score[1])
It actually gives very close results with metric from contrib module

How to handle log(0) when using cross entropy

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
Loss function
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Probability of Y
predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
Problem
Suppose we are running a feed-forward net.
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
Num of hidden units: 100 (only 1 hidden layer)
Iterations: 10000
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?
If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
with
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
How do you usually handle this issue?
Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.
BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.
One common way to deal with log(x) and y / x where x is always non-negative but can become 0 is to add a small constant (as written by Jakub).
You can also clip the value (e.g. tf.clip_by_value or np.clip).