Creating custom metrics in tensorflow estimators - tensorflow

I am training a classification problem using tensorflow estimators.
I want to calculate the f1 score for each batch to data along with precision and recall.
I calculate precision and recall using the code below and log them for evaluation and training.
I also calculate the fscore using the formula, but while logging the fscore I get an error.
pre = tf.metrics.precision(labels=labels,predictions=pred,name="precision")
rec = tf.metrics.recall(labels=labels,predictions=pred,name="recall")
fscore_val = tf.reduce_mean((2*pre[0]*rec[0]) / (pre[0] + rec[0] + 1e-5))
fscore_update = tf.group(pre[1], rec[1])
fscore = (fscore_val, fscore_update)
# logging metric at evaluation time
metrics['precision'] = pre
metrics['recall'] = rec
metrics['fscore'] = fscore
# logging metric at training time
tf.summary.scalar('precision', pre[1])
tf.summary.scalar('recall', rec[1])
tf.summary.scalar('fscore', fscore)
This is the error that I get.
TypeError: Expected float32, got <tf.Operation 'metrics_Left_Lane_Type/group_deps' type=NoOp> of type 'Operation' instead.
I understand why I am getting this error.
It is because fscore should be two values, similar to precision and recall.
Can someone please help me on how to do this in tensorflow estimators?

First of all, TensorFlow has it's own f1 score tf.contrib.metrics.f1_score and it is rather straightforward to use. The only possible downside is that it hides threshold value from user, choosing the best from specified quantity of possible thresholds.
predictions = tf.sigmoid(logits)
tf.contrib.metrics.f1_score(labels, predictions, num_thresholds=20)
If, for any reason, you want a custom implementation, you need to group update_ops. Every TensorFlow metric has operation that increments its value. You can set threshold manually when defining predictions
predictions = tf.greater(tf.sigmoid(logits), 0.5)
def f1_score(labels, predictions):
precision, update_op_precision = tf.metrics.precision(labels, predictions)
recall, update_op_recall = tf.metrics.recall(labels, predictions)
eps = 1e-5 #small constant for numerical stability
f1 = 2 * precision * recall / (precision + recall + eps)
f1_upd = 2 * update_op_precision * update_op_recall / (update_op_precision + update_op_recall + eps)
return f1, f1_upd
f1_score = f1_score(labels, predictions)
Then you can add it to eval_metric_ops dict or pass to summary.scalar
eval_metric_ops = {'f1': f1_score}
tf.summary.scalar('f1', f1_score[1])
It actually gives very close results with metric from contrib module

Related

Why does GradientTape behave differently when watching loop operations as opposed to array operations?

There is something about the workings of GradientTape that escapes my understanding.
Suppose we want to train an agent on the classic bandit problem using an actor-critic RL framework. There are two bandits, A and B, and the agent must learn to select A, which yields higher returns on average. The training consists of, say, 1000 epochs, in each of which the agent draws, say, 100 samples from each bandit. The reward is 1 every time the agent selects A, and 0 otherwise.
Let's see how the agent learns by observing rewards over 10 training simulations. Here is the code defining the agent and the environment (neither needs to be more complicated than below).
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from keras import Model
from keras.optimizers import Adam
n_sims = 10 # number of simulations
for n in range(n_sims):
# define actors and optimizers for each simulation
actor_input = Input(shape=(2,))
actor_output = Dense(2, activation='softmax')(actor_input)
globals()[f'actor_{n}'] = Model(inputs=actor_input, outputs=actor_output)
globals()[f'actor_opt_{n}'] = Adam(learning_rate=.1)
# define critics and optimizers for each simulation
critic_input = Input(shape=(2,))
critic_output = Dense(1, activation='softmax')(critic_input)
globals()[f'critic_{n}'] = Model(inputs=critic_input, outputs=critic_output)
globals()[f'critic_opt_{n}'] = Adam(learning_rate=.1)
globals()[f'mean_rewards_{n}'] = [] # list to store rewards over training epochs for each simulation
A = np.random.normal(loc=10, scale=15, size=int(1e5)) # bandit A
B = np.random.normal(loc=0, scale=1, size=int(1e5)) # bandit B
n_training_epochs = 1000
n_samples = 100
Let's consider two alternative codes for the training loop using GradientTape, both based on a simple 'vanilla' loss function.
The first is the slow one and literally involves a for loop over the samples drawn in each epoch. Cumulative actor and critic's losses are iteratively computed, and then their means are used to update their respective network weights.
for _ in range(n_training_epochs):
A_samples = np.random.choice(A, size=n_samples)
B_samples = np.random.choice(B, size=n_samples)
for n in range(n_sims):
cum_actor_loss, cum_critic_loss, cum_reward = 0, 0, 0
with tf.GradientTape() as actor_tape, tf.GradientTape() as critic_tape:
for A_sample, B_sample in zip(A_samples, B_samples):
probs = globals()[f'actor_{n}'](tf.reshape([A_sample, B_sample], (1,-1)))[0]
action = np.random.choice(['A','B'], p=np.squeeze(probs))
reward = 1 if action == 'A' else 0
cum_reward += reward
action_prob = probs[['A','B'].index(action)]
value = globals()[f'critic_{n}'](tf.reshape([A_sample, B_sample], (1,-1)))[0]
advantage = reward - value
cum_actor_loss += -tf.math.log(action_prob)*advantage
cum_critic_loss += advantage**2
mean_actor_loss = cum_actor_loss/n_samples
mean_critic_loss = cum_critic_loss/n_samples
globals()[f'mean_rewards_{n}'].append(cum_reward/n_samples)
actor_grads = actor_tape.gradient(mean_actor_loss, globals()[f'actor_{n}'].trainable_variables)
globals()[f'actor_opt_{n}'].apply_gradients(zip(actor_grads, globals()[f'actor_{n}'].trainable_variables))
critic_grads = critic_tape.gradient(mean_critic_loss, globals()[f'critic_{n}'].trainable_variables)
globals()[f'critic_opt_{n}'].apply_gradients(zip(critic_grads, globals()[f'critic_{n}'].trainable_variables))
If you plot the average training rewards over each epoch, you'll probably get something like this figure
In the second option, instead of using an explicit for loop over samples in each epoch, we perform operations on arrays. This alternative is much faster in terms of computation time.
for _ in range(n_training_epochs):
A_samples = np.random.choice(A, size=n_samples)
B_samples = np.random.choice(B, size=n_samples)
for n in range(n_sims):
with tf.GradientTape() as actor_tape, tf.GradientTape() as critic_tape:
probs = globals()[f'actor_{n}'](tf.reshape([[A_sample, B_sample] for A_sample, B_sample in zip(A_samples, B_samples)], (n_samples,-1)))
actions = np.array([np.random.choice(['A','B'], p=np.squeeze(probs[i])) for i in range(len(probs))]).reshape(n_samples, -1)
rewards = np.array([1.0 if action == 'A' else 0.0 for action in actions]).reshape(n_samples, -1)
globals()[f'mean_rewards_{n}'].append(np.mean(rewards))
values = globals()[f'critic_{n}'](tf.reshape([[A_sample, B_sample] for A_sample, B_sample in zip(A_samples, B_samples)], (n_samples,-1)))
advantages = rewards + tf.math.negative(values)
actions_num = [['A','B'].index(action) for action in actions]
action_probs = tf.reduce_sum(tf.one_hot(actions_num, len(['A','B'])) * probs, axis=1)
mean_actor_loss = -tf.reduce_mean(advantages * tf.math.log(action_probs))
mean_critic_loss = tf.reduce_mean(tf.pow(advantages, 2))
actor_grads = actor_tape.gradient(mean_actor_loss, globals()[f'actor_{n}'].trainable_variables)
globals()[f'actor_opt_{n}'].apply_gradients(zip(actor_grads, globals()[f'actor_{n}'].trainable_variables))
critic_grads = critic_tape.gradient(mean_critic_loss, globals()[f'critic_{n}'].trainable_variables)
globals()[f'critic_opt_{n}'].apply_gradients(zip(critic_grads, globals()[f'critic_{n}'].trainable_variables))
Let's plot the average reward over epochs, to obtain something like this
As you can see the agent tends to learn earlier and more stably in the first case than in the second (where learning may not even happen), although the two training loops are in theory mathematically equivalent. How is that? The reason has probably something to do with the fact that, in the first option, GradientTape is watching the trainable variables several times per epoch before applying the gradient, whereas in the second option it does so only once. Even so, I can't figure out why exactly this produces the observed results. Can you help me understand?

What Loss function to use for binary classification in CNN using float labels?

So I am building a CNN that gets images using labels that go from 0 to 1.
What I mean is that I am trying to perform detection of one thing in the image and each image has a label between 0 and 1 that stands for the probability of said type of event being in that image.
I want to output this probability so I am using a sigmoid activation function in the output layer but I am having trouble in deciding what loss function makes sense in this situation. If my labels were 0 and 1s I would use Binary CrossEntropy but does that still make sense when my labels are floats ranging from 0 to 1?
Cheers.
This solution is for logits (output of last linear layer) not for output probabilities
def loss(logits, soft_labels):
anti_soft_labels = 1 - soft_labels
return soft_labels * tf.nn.softplus(-logits)
+ anti_soft_labels * tf.nn.softplus(logits) + tf.math.xlogy(soft_labels, soft_labels) + tf.math.xlogy(anti_soft_labels, anti_soft_labels)
loss(logits=tf.constant([10., 0, -10]), soft_labels=tf.constant([1., 0.5, 0.]))
# [4.53989e-05, 0.00000e+00, 4.53989e-05]
If you need to have 0 as minimal loss value for any soft label use
def loss(logits, soft_labels):
anti_soft_labels = 1 - soft_labels
return soft_labels * tf.nn.softplus(-logits) \
+ anti_soft_labels * tf.nn.softplus(logits) \
+ tf.math.xlogy(soft_labels, soft_labels) \
+ tf.math.xlogy(anti_soft_labels, anti_soft_labels)
loss(logits=tf.constant([10., 0, -10]), soft_labels=tf.constant([1., 0.5, 0.]))
# [4.53989e-05, 0.00000e+00, 4.53989e-05]```

How does y_pred look like when making a custom loss function in keras?

I am training a UNet shaped CNN and have to deal with data imbalances. I want to minimise false negatives, so I want to implement a custom loss function that does so. I created the following loss function:
from tensorflow.keras import backend as K
def fbeta_loss(y_true, y_pred, beta=2., epsilon=K.epsilon()):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
tp = K.sum(y_true_f * y_pred_f)
predicted_positive = K.sum(y_pred_f)
actual_positive = K.sum(y_true_f)
precision = tp/(predicted_positive+epsilon) # calculating precision
recall = tp/(actual_positive+epsilon) # calculating recall
# calculating fbeta
beta_squared = K.square(beta)
fb = (1+beta_squared)*precision*recall / (beta_squared*precision + recall + epsilon)
return 1-fb
However, I am not sure if y_pred is binary, or a float number between 0 and 1. In my final layer I use a sigmoid activation. Does that mean if I create a custom loss function y_pred is a float between 0 and 1, and I should add a step that maps every value higher then a threshold(0.5) to 1 and lower to 0? Or is that step already included in the Keras model? Since in similar custom loss implementations that step is often not included, e.g. .
Hopefully this is sort of clear, I am relatively new to stackoverflow. Let me know if anything is missing! Thanks in advance.
The output of sigmoid activation function is always between 0 and 1.
In the limit of x tending towards infinity, S(x) converges to 1, and in the limit of x tending towards negative infinity, S(x) converges to 0. Here, the word converges does not mean that S(x) reach any of 0 or 1 but it converges to 0 and 1.
And so the output of S(x) is always a float between 0 and 1.
Range of S(x):
0 < S(x) < 1

How to apply bounds on a variable when performing optimisation in Pytorch?

I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx

change loss function during training

Suppose my loss function is of the following form:
loss = a*loss_1 + (1-a)*loss_2
Suppose also I am training for 100 steps. How can I dynamically change the loss function in tensorflow so that I gradually change "a" from 1 to 0 during the 100 steps of training?
To be precise, I want my loss to be
loss = 1*loss_1+0*loss_2 = loss_1
at the beginning of training (at step 1)
and
loss = 0*loss_1+1*loss_2 = loss_2 at the end (step 100)
with some kind of gradual (doesn't have to be continuous) decrease in between.
Assuming that the value of a does not depend on the computation done at the current step, create a placeholder for a then pass the value you want using the feed dictionary.
You can use tf.train.polynomial_decay.
tf.train.polynomial_decay(learning_rate=1, global_step=step_from_placeholder,
decay_steps=100, end_learning_rate=0,
power=1.0, cycle=False, name=None)
This computes
global_step = min(global_step, decay_steps)
decayed_learning_rate = (learning_rate - end_learning_rate) * \
(1 - global_step / decay_steps) ** (power) + end_learning_rate