I am trying to write a custom Keras loss function but I am having issues with implementing and debugging my code. My target vector is:
y_pred = [p_conf, p_class_1, p_class_2]
where, p_conf = confidence an event of interest was detected
y_true examples:
[0, 0, 0] = no event of interest
[1, 1, 0] = first class event
[1, 0, 1] = second class event
I get relatively good results using multi-label classification (i.e. using a sigmoid activation in my final layer and binary_crossentropy loss function) but I want to experiment and improve my results using a custom loss function that calculates the:
binary_crossentropy loss for when y_true = [0, ..., ...]
categorical_crossentropy loss for when y_true = [1, ..., ...]
This is a simplified loss function used by the YOLO object detection algorithm. I tried adapting an existing Keras / TensorFlow implementation of the YOLO loss function but have not been successful.
Here is my current working code. It runs but generates unstable results. i.e. loss and accuracy decreases over time. Any assistance would be greatly appreciated.
import tensorflow as tf
from keras import losses
def custom_loss(y_true, y_pred):
# Initialisation
mask_shape = tf.shape(y_true)[:0]
conf_mask = tf.zeros(mask_shape)
class_mask = tf.zeros(mask_shape)
# Labels
true_conf = y_true[..., 0]
true_class = tf.argmax(y_true[..., 1:], -1)
# Predictions
pred_conf = tf.sigmoid(y_pred[..., 0])
pred_class = y_pred[..., 1:]
# Masks for selecting rows based on confidence = {0, 1}
conf_mask = conf_mask + (1 - y_true[..., 0])
class_mask = y_true[..., 0]
nb_class = tf.reduce_sum(tf.to_float(class_mask > 0.0))
# Calculate loss
loss_conf = losses.binary_crossentropy(true_conf, pred_conf)
loss_class = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_class, logits=pred_class)
loss_class = tf.reduce_sum(loss_class * class_mask) / nb_class
loss = loss_conf + loss_class
return loss
As said in https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer?hl=en#minimize, the first parameter of minmize should satisfy the requirement,
Tensor or callable. If a callable, loss should take no arguments and return the value to minimize. If a Tensor, the tape argument must be passed.
The first piece of code takes tensor as the input of minimize(), and it requires the gradient tape, but I don't know how.
The second piece of code takes callable function as the input of minimize(), which is easy
import numpy as np
import tensorflow as tf
from tensorflow import keras
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random.normal([1]), name='weight')
b = tf.Variable(tf.random.normal([1]), name='bias')
hypothesis = W * x_train + b
def cost():
y_model = W * x_train + b
error = tf.reduce_mean(tf.square(y_train - y_model))
return error
optimizer = tf.optimizers.SGD(learning_rate=0.01)
cost_value = cost()
train = tf.keras.optimizers.Adam().minimize(cost_value, var_list=[W, b])
How to add the gradient tape, I know the following code certainly works.
import numpy as np
import tensorflow as tf
from tensorflow import keras
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random.normal([1]), name='weight')
b = tf.Variable(tf.random.normal([1]), name='bias')
hypothesis = W * x_train + b
def cost():
y_model = W * x_train + b
error = tf.reduce_mean(tf.square(y_train - y_model))
return error
optimizer = tf.optimizers.SGD(learning_rate=0.01)
cost_value = cost()
train = tf.keras.optimizers.Adam().minimize(cost, var_list=[W, b])
Please help me revise the first piece of code and let it run, thanks!
This occurs because .minimize() expects a function. While cost_value&cost(), is a tf.Tensor object, cost is a tf.function. You should directly pass your loss function into the minimize as tf.keras.optimizers.Adam().minimize(cost, var_list=[W, b]).
Changed part for Gradient:
train = tf.keras.optimizers.Adam().minimize(cost(), var_list=[W, b],tape=tf.GradientTape())
This is a late answer (Hakan basically got it for you), but I write this in hopes that it will help people in the future that are stuck and googling this exact question (like I was). This is also an alternate implementation using the tf.GradientTape() directly.
import numpy as np
import tensorflow as tf
from tensorflow import keras
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random.normal([1]), trainable = True, name='weight')
b = tf.Variable(tf.random.normal([1]), trainable = True, name='bias')
def cost(W, b):
y_model = W * x_train + b
error = tf.reduce_mean(tf.square(y_train - y_model))
return error
optimizer = tf.optimizers.SGD(learning_rate=0.01)
trainable_vars = [W,b]
epochs = 100 #(or however many iterations you want it to run)
for _ in range(epochs):
with tf.GradientTape() as tp:
#your loss/cost function must always be contained within the gradient tape instantiation
cost_fn = cost(W, b)
gradients = tp.gradient(cost_fn, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
This should give you the value of your weights and biases after the number of epochs you ran.
You must compute the loss function everytime a new gradient tape is invoked. Then you get the gradient of your loss function, and then call optimizer.apply_gradient to do your minimization according to what tensorflow documentation says here: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer#apply_gradients.
I want to create a tensor which is some kind of a transformation matrix (rotation matrix for instance)
My model predicts 2 parameters: x1 and x2
so the output is a tensor of (B, 2), when B is number of batches.
however, when I write my loss, I have to know this "B" since I want to iterate over it:
def get_rotation_tensor(x):
roll_mat = K.stack([ [[1, 0, 0],
[0, K.cos(x[i, 0]), -K.sin(x[i, 0])],
[0, K.sin(x[i, 0]), K.cos(x[i, 0])]] for i in range(BATCH_SIZE)])
pitch_mat = K.stack([ [[K.cos(x[i, 1]), 0, K.sin(x[i, 1])],
[0, 1, 0],
[-K.sin(x[i, 1]), 0, K.cos(x[i, 1])]] for i in range(BATCH_SIZE)])
return K.batch_dot(pitch_mat, roll_mat)
the only solution I could have think of is to pre-define the BATCH_SIZE in advance.. but is there a way to write a general loss function that will work for every batch size?
I found a solution
def get_rotation_tensor(x):
ones = K.ones_like(x[:, 0])
zeros = K.zeros_like(x[:, 0])
roll_mat = K.stack([[ones, zeros, zeros],
[zeros, K.cos(x[:, 0]), -K.sin(x[:, 0])],
[zeros, K.sin(x[:, 0]), K.cos(x[:, 0])]])
pitch_mat = K.stack([[K.cos(x[:, 1]), zeros, K.sin(x[:, 1])],
[zeros, ones, zeros],
[-K.sin(x[:, 1]), zeros, K.cos(x[:, 1])]])
return K.batch_dot(K.permute_dimensions(pitch_mat, (2, 0, 1)),
K.permute_dimensions(roll_mat, (2, 0, 1)))
Perhaps I'm not fully understanding your issue, but can't you just determine the batch size by the shape of the tensors passed into the loss function. Below is an example that shows the idea. I hope this helps.
# Install TensorFlow
# %tensorflow_version only exists in Colab.
%tensorflow_version 2.x
except Exception:
import tensorflow as tf
# Setup repro section from Keras FAQ with TF1 to TF2 adjustments
import numpy as np
import random as rn
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/
session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/set_random_seed
sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
# Rest of code follows ...
# Custom Loss
def my_custom_loss(y_true, y_pred):
tf.print('inside my_custom_loss:')
tf.print('y_true column 0:')
tf.print('y_true column 1:')
# get length/batch size
y_zeros = tf.zeros_like(y_pred)
y_mask = tf.math.greater(y_pred, y_zeros)
res = tf.boolean_mask(y_pred, y_mask)
logres = tf.math.log(res)
finres = tf.math.reduce_sum(logres)
return finres
# Define model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', input_dim=1, name="Dense1"))
model.compile(optimizer='rmsprop', loss=my_custom_loss)
# Generate dummy data
data = np.array([[2.0],[1.0],[1.0],[3.0],[4.0]])
labels = np.array([[[2.0],[1.0]],
# Train the model.
print('training the model:')
model.fit(data, labels, epochs=1, batch_size=3)
print('done training the model.')
I want to write a custom loss function that would penalize underestimation of positive target values with weights. It would work like mean square error, with the only difference that square errors in said case would get multiplied with a weight greater than 1.
I wrote it like this:
def wmse(ground_truth, predictions):
square_errors = np.square(np.subtract(ground_truth, predictions))
weights = np.ones_like(square_errors)
weights[np.logical_and(predictions < ground_truth, np.sign(ground_truth) > 0)] = 100
weighted_mse = np.mean(np.multiply(square_errors, weights))
return weighted_mse
However, when I supply it to my Sequential model in keras with tensorflow as backend:
I get the following error:
raise TypeError("Using a `tf.Tensor` as a Python `bool` is not allowed.
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
The traceback points to this line in wmse:
weights[np.logical_and(predictions < ground_truth, np.sign(ground_truth) > 0)] = 100
I have never worked with keras nor tensorflow until now, so I'd appreciate if someone helped me to adapt this loss function to keras/tensorflow framework. I tried to replace np.logical_and with tensorflow.logical_and, but to no avail, the error is still there.
As #nuric mentioned, you have to implement your loss using only Keras / Tensorflow operations with derivatives, as these frameworks won't be able to back-propagate through other operations (like numpy ones).
A Keras only implementation could look like this:
from keras import backend as K
def wmse(ground_truth, predictions):
square_errors = (ground_truth - predictions) ** 2
weights = K.ones_like(square_errors)
mask = K.less(predictions, ground_truth) & K.greater(K.sign(ground_truth), 0)
weights = K.switch(mask, weights * 100, weights)
weighted_mse = K.mean(square_errors * weights)
return weighted_mse
gt = K.constant([-2, 2, 1, -1, 3], dtype="int32")
pred = K.constant([-2, 1, 1, -1, 1], dtype="int32")
weights, loss = wmse(gt, pred)
sess = K.get_session()
# 100
I need to extract the high frequencies form an image in tensorflow.
Basically the functionality from ndimage.gaussian_filter(img, sigma)
The following code works as expected:
import tensorflow as tf
import cv2
img = cv2.imread(imgpath, cv2.IMREAD_GRAYSCALE)
img = cv2.normalize(img.astype('float32'), None, 0.0, 1.0, cv2.NORM_MINMAX)
# Gaussian Filter
K = np.array([[0.003765,0.015019,0.023792,0.015019,0.003765],
[0.003765,0.015019,0.023792,0.015019,0.003765]], dtype='float32')
# as tensorflow constants with correct shapes
x = tf.constant(img.reshape(1,img.shape[0],img.shape[1], 1))
w = tf.constant(K.reshape(K.shape[0],K.shape[1], 1, 1))
with tf.Session() as sess:
# get low/high pass ops
lowpass = tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
highpass = x-lowpass
# get high pass image
l = sess.run(highpass)
l = l.reshape(img.shape[0],img.shape[1])
However I don't know how the get the Gaussian weights form within tensorflow with a given sigma.
just refer this tflearn data augmentation-http://tflearn.org/data_augmentation/ here u can find add_random_blur(sigma_max=5.0) which randomly blur an image by applying a gaussian filter with a random sigma (0., sigma_max).
Trying to implement a minimal toy RNN example in tensorflow.
The goal is to learn a mapping from the input data to the target data, similar to this wonderful concise example in theanets.
Update: We're getting there. The only part remaining is to make it converge (and less convoluted). Could someone help to turn the following into running code or provide a simple example?
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
init_scale = 0.1
num_steps = 7
num_units = 7
input_data = [1, 2, 3, 4, 5, 6, 7]
target = [2, 3, 4, 5, 6, 7, 7]
#target = [1,1,1,1,1,1,1] #converges, but not what we want
batch_size = 1
with tf.Graph().as_default(), tf.Session() as session:
# Placeholder for the inputs and target of the net
# inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
input1 = tf.placeholder(tf.float32, [batch_size, 1])
inputs = [input1 for _ in range(num_steps)]
outputs = tf.placeholder(tf.float32, [batch_size, num_steps])
gru = rnn_cell.GRUCell(num_units)
initial_state = state = tf.zeros([batch_size, num_units])
loss = tf.constant(0.0)
# setup model: unroll
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
step_ = inputs[time_step]
output, state = gru(step_, state)
loss += tf.reduce_sum(abs(output - target)) # all norms work equally well? NO!
final_state = state
optimizer = tf.train.AdamOptimizer(0.1) # CONVERGEs sooo much better
train = optimizer.minimize(loss) # let the optimizer train
numpy_state = initial_state.eval()
for epoch in range(10): # now
for i in range(7): # feed fake 2D matrix of 1 byte at a time ;)
feed_dict = {initial_state: numpy_state, input1: [[input_data[i]]]} # no
numpy_state, current_loss,_ = session.run([final_state, loss,train], feed_dict=feed_dict)
print(current_loss) # hopefully going down, always stuck at 189, why!?
I think there are a few problems with your code, but the idea is right.
The main issue is that you're using a single tensor for inputs and outputs, as in:
inputs = tf.placeholder(tf.int32, [batch_size, num_steps]).
In TensorFlow the RNN functions take a list of tensors (because num_steps can vary in some models). So you should construct inputs like this:
inputs = [tf.placeholder(tf.int32, [batch_size, 1]) for _ in xrange(num_steps)]
Then you need to take care of the fact that your inputs are int32s, but a RNN cell works on float vectors - that's what embedding_lookup is for.
And finally you'll need to adapt your feed to put in the input list.
I think the ptb tutorial is a reasonable place to look, but if you want an even more minimal example of an out-of-the-box RNN you can take a look at some of the rnn unit tests, e.g., here.