Manipulating data in keras custom loss function for CNN - numpy

I'm trying to write a custom loss function in Keras for a CNN I'm working on. Y_true and Y_pred will both be tensors of grayscale images, so I expect a shape of [a, x, y, 1], where x and y are the dimensions of my images and a is the batch size.
The plan is to:
Threshold each image of Y_true by its mean pixel intensity
Use the non-zero elements of this mask to get an array of pixel values from Y_true and Y_pred
Measure the cosine similarity (using the built-in Keras loss function) of these arrays and return the average result of the batch as the loss
My main question is how I can efficiently implement this process?
Does the cosine_similarity function work on 1D arrays?
I know that I should avoid for loops to maintain efficiency but it's the only way I can think of implementing this function. Is there a more efficient way to implement this function using the Keras backend or numpy?
EDIT
Basic implementation and an unexpected error when compiling the model with this function:
def masked_cosine_similarity(y_true, y_pred):
loss = 0
for i in range(y_true.shape[0]):
true_y = y_true[i,:,:,0]
pred_y = y_pred[i,:,:,0]
mask = true_y > np.mean(true_y)
elements = np.nonzero(mask)
true_vals = np.array([true_y[x,y] for x, y in zip(elements[0], elements[1])])
pred_vals = np.array([pred_y[x,y] for x, y in zip(elements[0], elements[1])])
loss += cosine_similarity(true_vals, pred_vals)
return loss / y_true.shape[0]
Error message:
64 loss = 0
---> 65 for i in range(y_true.shape[0]):
66 true_y = y_true[i,:,:,0]
67 pred_y = y_pred[i,:,:,0]
TypeError: 'NoneType' object cannot be interpreted as an integer

The shape of a tensor in Keras/TF is usually [None, height, width, channels].
This is due to the support of an arbitrary batch size, you don't want to build a model that only works for a specific batch size. For that reason, your code collapses on:
for i in range(y_true.shape[0]):
since y_true.shape[0] == None.
Why do you loop over the batch? You don't need to do it.
For example, given some element-wize loss function (MSE/cosine loss etc.) you can do something like:
def my_loss(y_true, y_pred):
mask = tf.keras.backend.cast(y_true >= tf.math.reduce_mean(y_true, axis=[1,2], keepdims=True), 'float32')
masked_loss = K.sum(mask * elementwize_loss(y_true, y_pred), axis=-1)
num_valid_pixels = K.maximum(1.0, K.cast(K.sum(mask), 'float32'))
return masked_loss / num_valid_pixels

Related

When writing a custom loss function, should I use tf.reduce_mean, and if so how? Does it ever matter?

The sample code below shows that all the following give the same (correct) results when
writing a custom loss function (calculating mean_squared_error) for
a simple linear regression model.
Do not use tf_reduce_mean() (so returning a loss for each example)
Use tf_reduce_mean() (so returning a single loss)
Use tf_reduce_mean(..., axis-1)
Is there any reason to prefer one approach to another, and are there any circumstances
where it makes a difference?
(There is, for example sample code at
Make a custom loss function in keras
that suggests axis=-1 should be used)
import numpy as np
import tensorflow as tf
# Create simple dataset to do linear regression on
# The mean squared error (~ best achievable MSE loss after fitting linear regression) for this dataset is 0.01
xtrain = np.random.randn(5000) # Already normalized
ytrain = xtrain + np.random.randn(5000) * 0.1 # Close enough to being normalized
# Function to create model and fit linear regression, and report final loss
def cre_and_fit(loss="mean_squared_error", lossdescription="",epochs=20):
model = tf.keras.models.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss=loss, optimizer="RMSProp")
history = model.fit(xtrain, ytrain, epochs=epochs, verbose=False)
print(f"Final loss value for {lossdescription}: {history.history['loss'][-1]:.4f}")
# Result from standard MSE loss ~ 0.01
cre_and_fit("mean_squared_error","Keras standard MSE")
# This gives the right result, not reducing. Return shape = (batch_size,)
cre_and_fit(lambda y_true, y_pred: (y_true-y_pred)*(y_true-y_pred),
"custom loss, not reducing over batch items" )
# This also gives the right result, reducing over batch items. Return shape = ()
cre_and_fit(lambda y_true, y_pred: tf.reduce_mean((y_true-y_pred)*(y_true-y_pred) ),
"custom loss, reducing over batch items")
# How about using axis=-1? Also gives the same result
cre_and_fit(lambda y_true, y_pred: tf.reduce_mean((y_true-y_pred)*(y_true-y_pred), axis=-1),
"custom loss, reducing with axis=-1" )
When you pass a lambda (or a callable in general) to compile and call fit, TF will wrap it inside a LossFunctionWrapper, which is a subclass of Loss, with a default reduction type of ReductionV2.AUTO. Note that a Loss object always has a reduction type representing how it will reduce the loss tensor to a single scalar.
Under most circumstances, ReductionV2.AUTO translates to ReductionV2.SUM_OVER_BATCH_SIZE which, despite its name, actually performs reduced mean over all axis on the underlying lambda's output.
import tensorflow as tf
from keras import losses as losses_mod
from keras.utils import losses_utils
a = tf.random.uniform((10,2))
b = tf.random.uniform((10,2))
l_auto = losses_mod.LossFunctionWrapper(fn=lambda y_true, y_pred : tf.square(y_true - y_pred), reduction=losses_utils.ReductionV2.AUTO)
l_sum = losses_mod.LossFunctionWrapper(fn=lambda y_true, y_pred : tf.square(y_true - y_pred), reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE)
l_auto(a,b).shape.rank == l_sum(a,b).shape.rank == 0 # rank 0 means scalar
l_auto(a,b) == tf.reduce_mean(tf.square(a - b)) # True
l_sum(a,b) == tf.reduce_mean(tf.square(a - b)) # True
So to answer your question, the three options are equivalent since they all eventually result in a single scalar that is the mean of all elements in the raw tf.square(a - b) loss tensor. However, should you wish to perform an operation other than reduce_mean e.g., reduce_sum, in the lambda, then the three will yield different results:
l1 = losses_mod.LossFunctionWrapper(fn=lambda y_true, y_pred : tf.square(y_true - y_pred),
reduction=losses_utils.ReductionV2.AUTO)
l2 = losses_mod.LossFunctionWrapper(fn=lambda y_true, y_pred : tf.reduce_sum(tf.square(y_true - y_pred)),
reduction=losses_utils.ReductionV2.AUTO)
l3 = losses_mod.LossFunctionWrapper(fn=lambda y_true, y_pred : tf.reduce_sum(tf.square(y_true - y_pred), axis=-1),
reduction=losses_utils.ReductionV2.AUTO)
l1(a,b) == tf.reduce_mean(tf.square(a-b)) # True
l2(a,b) == tf.reduce_sum(tf.square(a-b)) # True
l3(a,b) == tf.reduce_mean(tf.reduce_sum(tf.square(a-b), axis=-1)) # True
Concretely, l2(a,b) == tf.reduce_mean(tf.reduce_sum(tf.square(a-b))), but that is just tf.reduce_sum(tf.square(a-b)) since mean of a scalar is itself.

Feeding weight maps into a CNN (UNET network) in keras

I have implemented UNET network described in here
The network is working fine, but in the paper, they have mentioned adding weighted maps into the network for better boundary separation. The weight maps are calculated this way as far as I understand
def unet_weight_map(y, wc=None, w0 = 10, sigma = 5):
"""
Parameters
----------
mask: Numpy array
2D array of shape (image_height, image_width) representing binary mask
of objects.
wc: dict
Dictionary of weight classes.
w0: int
Border weight parameter.
sigma: int
Border width parameter.
Returns
-------
Numpy array
Training weights. A 2D array of shape (image_height, image_width).
"""
labels = label(y)
no_labels = labels == 0
label_ids = sorted(np.unique(labels))[1:]
if len(label_ids) > 1:
distances = np.zeros((y.shape[0], y.shape[1], len(label_ids)))
for i, label_id in enumerate(label_ids):
distances[:,:,i] = distance_transform_edt(labels != label_id)
distances = np.sort(distances, axis=2)
d1 = distances[:,:,0]
d2 = distances[:,:,1]
w = w0 * np.exp(-1/2*((d1 + d2) / sigma)**2) * no_labels
else:
w = np.zeros_like(y)
if wc:
class_weights = np.zeros_like(y)
for k, v in wc.items():
class_weights[y == k] = v
w = w + class_weights
return w
Until here everything is fine. But, my question is that how I can get use of these weight maps in the network. I have a weighted binary cross entropy loss defined as below
def weighted_binary_crossentropy( y_true, y_pred, weight=[1.,2.]):
y_true = K.clip(y_true, K.epsilon(), 1-K.epsilon())
y_pred = K.clip(y_pred, K.epsilon(), 1-K.epsilon())
logloss = -(y_true * K.log(y_pred) * weight[0] + (1 - y_true) * K.log(1 - y_pred)*weight[1])
return K.mean( logloss, axis=-1)
But, here I give the weights as a [a, b] array into the loss for class weights and then feed this loss to the network when compiling. My question is that should I feed those maps into this customized loss function? if so, how? if not, what other way can I use in Keras? Please help. I have read many stack overflow questions related to this problem, but I could not get an answer. I can provide any information regarding my network if needed.
In order to pass your own parameters to a custom loss function, you have 2 ways. You should either subclass loss, or use a wrapper function.
For example you can set a wrapper function like this:
def wrapper_loss(weights=[1.,2.]):
def weighted_binary_crossentropy(y_true, y_pred):
y_true = K.clip(y_true, K.epsilon(), 1-K.epsilon())
y_pred = K.clip(y_pred, K.epsilon(), 1-K.epsilon())
logloss = -(y_true * K.log(y_pred) * weight[0] + (1 - y_true) * K.log(1 - y_pred)*weight[1])
return K.mean(logloss, axis=-1)
return weighted_binary_crossentropy
Then, pass it to the model.compile() like this:
model.compile(loss=wrapper_loss(weights=[1.,2.]), optimizer=...)
P.S: You may need to check these out:
tf.nn.weighted_cross_entropy_with_logits
class_weight argument for model.fit()
I realized how to use those maps. First I define an Input (with the same shape as ground truth labels) as the way we define Input when feeding the input images. Something like
weights = Input(shape=(shape_of_groundtruth_labels))
I define the customized loss with the same structure as wrapper_loss defined above; with weight maps this time, not class weights [1, 2]. Then, when defining the model which needs input and output. I give the input as both input images and input weights. something like:
model = Model(inputs=[images, weights], outputs=...)
where weights are the one I defined in the Input layer. In the model.compile(), I give the loss as the name of my customized loss (wrapper_loss) with the inputs weights. something like
model.compile(optimizer=..., loss=wrapper_loss(weight = weights), ...)
where the second 'weights' is the one defined in Input layer.
Now, the last thing to do is to do the same in model.fit; I give the weight maps with the images with the same structure as above.

ValueError: No gradients provided for any variable when defining custom loss function

underneeth you can find my custom loss function.
def custom_loss_function(y_true, y_pred):
y_pred_bool = tf.math.less_equal(y_pred, tf.constant(0.5))
y_pred_float = 1 - tf.cast(y_pred_bool, dtype=tf.int32)
y_true = tf.cast(y_true, dtype=tf.int32)
mask_bool_loss = tf.math.less(y_true, tf.constant(0))
mask_loss = 1 - tf.cast(mask_bool_loss, dtype=tf.int32)
mask = tf.math.reduce_min(mask_loss, axis=2)
y_multiply = tf.math.multiply(y_true, y_pred_float)
y = tf.math.reduce_sum(y_multiply, axis=2)
y_loss = 1 - y
y_loss = tf.math.multiply(y_loss, mask)
return y_loss
I know some functions of tensorflow are not differentiable, but I really don't know which functions or how to get around it? Any suggestions for me?
I get this error:
ValueError: No gradients provided for any variable: ['bidirectional_7/forward_lstm_7/lstm_cell_22/kernel:0'/, ...
As soon as you cast your variables to int or bool, all gradient information is lost. So the gradients are broken in this first line.
y_pred_bool = tf.math.less_equal(y_pred, tf.constant(0.5))
This is the reason why we usually use things like the binary cross-entropy, as it gives us a differentiable approximation to the non-differentiable 0-1 loss.

A custom layer in Keras returns NaN as a gradient. What are some potential issues causing this?

I work on a project where we try to reconstruct a 2D image from geometric primitives. To this end, I have developped a custom Keras layer which outputs an image of a cone given its geometric charcteristics.
Its input is a tensor of shape batch_size * 5, where the five numbers are the xy coordinates of the apex of the cone, the xy coordinates of the unit vector describing the axis of the cone, and the angle at the top of the cone.
The goal is to use this layer as a non-trainable decoder in an encoder-decoder architecture. We would then feed the neural network with cone images. The expected behavior is that the neural network should then learn a latent representation similar to the one described above.
When I incorporate this layer in a larger network and try to optimize it, invariably some weights end up being updated to NaN. This happens even with a network as simple as a two-neuron hidden layer without activation functions.
I have thoroughly tested my layer. Its output is consistent with what I expect it to be. I can't find any trivial mistake in the implementation (but you should be warned I am still fairly new to tensorflow and keras). I have narrowed the issue down to the automatic differentiation of the layer.
The gradient appears to be equal either to 0.0 or to NaN. My understanding is that some numerical instability causes the gradient to diverge.
The question is twofold :
what is the underlying cause here ?
how can I fix it ?
Below is a minimum working example showing how the gradient winds up equal to 0.0 or NaN for specific values.
import numpy as np
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer
import tensorflow as tf
import numpy.random as rnd
class Cones(Layer):
def __init__(self, output_dim, **kwargs):
super(Cones, self).__init__(**kwargs)
self.output_dim = output_dim
coordinates = np.zeros((self.output_dim, self.output_dim, 2))
for i in range(self.output_dim):
for j in range(self.output_dim):
coordinates[i,j,:] = np.array([i,j])
coordinates = K.constant(coordinates)
self.coordinates = tf.Variable(initial_value=coordinates, trainable=False)
self.smooth_sign_width = tf.Variable(initial_value=output_dim, dtype=tf.float32, trainable=False)
self.grid_width = tf.Variable(initial_value=output_dim, dtype=tf.float32, trainable=False)
def build(self, input_shape):
super(Cones, self).build(input_shape)
def call(self, x):
center = self.grid_width*x[:,:2]
center = K.expand_dims(center, axis=1)
center = K.expand_dims(center, axis=1)
direction = x[:,2:4]
direction = K.expand_dims(direction,1)
direction = K.expand_dims(direction,1)
direction = K.l2_normalize(direction, axis=-1)
aperture = np.pi*x[:,4:]
aperture = K.expand_dims(aperture)
u = self.coordinates - center
u = K.l2_normalize(u, axis=-1)
angle = K.sum(u*direction, axis=-1)
angle = K.minimum(angle, K.ones_like(angle))
angle = K.maximum(angle, -K.ones_like(angle))
angle = tf.math.acos(angle)
output = self.smooth_sign(aperture-angle)
output = K.expand_dims(output, -1)
return output
def smooth_sign(self, x):
return tf.math.sigmoid(self.smooth_sign_width*x)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim, self.output_dim, 1)
geom = K.constant([[0.34015268, 0.31530404, -0.6827047, 0.7306944, 0.8521315]])
image = Cones(Nx)(geom)
x0 = geom
y0 = image
with tf.GradientTape() as t:
t.watch(x0)
cone = Cones(Nx)(x0)
error = cone-y0
error_squared = error*error
mse = tf.math.reduce_mean(error_squared)
print(t.gradient(mse, x0))
geom = K.constant([[0.742021, 0.25431857, 0.90899783, 0.4168009, 0.58542883]])
image = Cones(Nx)(geom)
x0 = geom
y0 = image
with tf.GradientTape() as t:
t.watch(x0)
cone = Cones(Nx)(x0)
error = cone-y0
error_squared = error*error
mse = tf.math.reduce_mean(error_squared)
print(t.gradient(mse, x0))
First of all, I answer my own question and leave it there in case it may help someone in the future. I don't know if this is the generally agreed upon etiquette at StackOverflow.
By commenting the successive steps of the call function, I found out that the issue was with tf.math.acos. In the code above, I already had an issue with acos which led me to clip the values I fed it between -1 and 1. Numerical issues meant that sometimes the dot product of two unit vectors fell outside of this range, where acos is defined. However, by doing so, I ended up evaluating acosat 1 and -1, where it is not differentiable, hence the NaN in the gradient.
To fix this issue, I first changed my method to calculate the angle between two vectors, using this scicomp stack exchange answer. Then, I clipped the range on which I perform the computation to avoid the non differentiability of sqrt at 0. More precisely, whenever I have c > 1.95, I round the angle to pi, and whenever I have c < 0.05, I round the angle to 0.

Keras: Predict model within custom loss function

I am trying to use some_model.predict(x) within a custom loss function.
I found this custom loss function:
_EPSILON = K.epsilon()
def _loss_tensor(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = -(y_true * K.log(y_pred) + (1.0 - y_true) * K.log(1.0 - y_pred))
return K.mean(out, axis=-1)
But the problem is that model.predict() is expecting a numpy array.
So I looked for how to convert a tensor (y_pred) to a numpy array.
I found tmp = K.tf.round(y_true) but this returns a tensor.
I have also found: x = K.eval(y_true) which takes a Keras variable and returns a numpy array.
This produces the error: You must feed a value for placeholder tensor 'dense_78_target' with dtype float.....
Some people suggested setting the learning phase to true. I did that, but it did not help.
What I just want to do:
def _loss_tensor(y_true, y_pred):
y_tmp_true = first_decoder.predict(y_true)
y_tmp_pred = first_decoder.predict(y_pred)
return keras.losses.binary_crossentropy(y_tmp_true,y_tmp_pred)
Any help would be appreciated.
This works:
sess = K.get_session()
with sess.as_default():
tmp = K.tf.constant([1,2,3]).eval()
print(tmp)
I also tried this now:
tmp = first_decoder(y_true)
This fails the assertion:
assert input_shape[-1]
Maybe someone knows how to resolve this?
Update:
I can now feed it through the model with:
y_t = first_decoder(K.reshape(y_true, (1,512)))
y_p = first_decoder(K.reshape(y_pred, (1,512)))
But when I try to return the binary cross entropy the shape is not right:
Input to reshape is a tensor with 131072 values, but the requested shape has
512
I figured out that 131072 was the product of my batch size and input size (256*512). I then adopted my code to reshape to (256,512) size. The first batch runs fine, but then I get another error that says that the passed size was (96,512).
[SOLVED]Update:
It works now:
def _loss_tensor(y_true, y_pred):
num_ex = K.shape(y_true)[0]
y_t = first_decoder(K.reshape(y_true, (num_ex,512)))
y_p = first_decoder(K.reshape(y_pred, (num_ex,512)))
return keras.losses.binary_crossentropy(y_t,y_p)