Keras cross entropy loss with missing labels in multi-objective training - tensorflow

I have a Keras neural network, using the Functional API, that has multiple outputs and multiple loss functions (some regression, some multi-class classification). I will always have a label for at least one of the outputs in training but commonly at least one will be missing.
I'm trying to write a custom categorical cross entropy loss function:
def custom_error_function(y_true, y_pred):
bool_finite = y_true != -1
loss = keras.losses.CategoricalCrossentropy(from_logits=True)
one_hotted = one_hot(np.int(boolean_mask(y_true, bool_finite)), depth=5)
return loss(one_hotted, boolean_mask(y_pred, bool_finite, axis=1))
where y_pred and y_true should have the same shape ([n_samples_in_batch, n_classes (5)]) and a value of -1 for y_true indicates missing.
But when I run this, I get
ValueError: in user code:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/var/folders/pn/c0hwfk8n7q9442628b1g_p1r0000gp/T/ipykernel_13239/802342025.py", line 12, in custom_error_function *
return loss(one_hotted, boolean_mask(y_pred, bool_finite, axis=1))
ValueError: Shapes (5,) and (None, 1) are incompatible
I'm a bit flummoxed and would appreciate any assistance. Thanks!

The problem comes from axis=1 in the loss call, the following should work:
def custom_error_function(y_true, y_pred):
bool_finite = y_true != -1
loss = keras.losses.CategoricalCrossentropy(from_logits=True)
return loss(tf.boolean_mask(y_true, bool_finite), tf.boolean_mask(y_pred, bool_finite))

Related

Problem with tensor shape when implementing a custom loss function for my model in Tensorflow

I picked up the idea of triplet loss and global orthogonal regularization from this paper http://cs230.stanford.edu/projects_fall_2019/reports/26251543.pdf. However, I keep getting caught up in an tensor shape error.
After I define modelv1 as the base model (modelv1 take input of shape (None,224,224,3) and return tensor of shape (None,64)), the complete model will be defined as follow:
input_shape=(3,224,224,3)
input_all=Input(shape=input_shape)
input_anchor=input_all[:,0,:]
input_pos=input_all[:,1,:]
input_neg=input_all[:,2,:]
output_anchor=modelv1(input_anchor)
output_pos=modelv1(input_pos)
output_neg=modelv1(input_neg)
model=Model(inputs=input_all,outputs=[output_anchor,output_pos,output_neg])
The formula for triplet loss with global orthogonal regularization, as provided in the paper I mentioned above is:
Formular for the loss function
I implemented this formular as follow:
def triplet_loss_with_margin(margin=0.4,d=64,alpha=1.1):
def triplet_loss(y_true,y_pred):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 64)
positive -- the encodings for the positive images, of shape (None, 64)
negative -- the encodings for the negative images, of shape (None, 64)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,positive)),axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,negative)),axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.math.add(tf.math.subtract(pos_dist,neg_dist),margin)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.math.reduce_sum(tf.math.maximum(basic_loss,0.0))
# add regularization term
dot_product=tf.matmul(anchor,tf.transpose(negative))
multiply_2_vectors_value=tf.linalg.diag_part(dot_product)
M1=tf.math.reduce_sum(multiply_2_vectors_value,axis=-1)
M2=tf.math.square(multiply_2_vectors_value)
M2=tf.math.maximum(tf.math.subtract(M2,1/d),0.0)
M2=tf.math.reduce_sum(M2,axis=-1)
loss+=alpha*(tf.math.square(M1)+M2)
return loss
return triplet_loss
I assumed that since anchor and negative all have shape (None,64), this approach should work. However, when I trained the model, I encoutered the error bellow
ValueError: in user code:
/opt/conda/lib/python3.7/site-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/tmp/ipykernel_24/1319124991.py:34 triplet_loss *
dot_product=tf.matmul(anchor,tf.transpose(negative))
/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper **
return target(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3655 matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py:5714 mat_mul
name=name)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:601 _create_op_internal
compute_device)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3569 _create_op_internal
op_def=op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2042 __init__
control_input_ops, op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1883 _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 2 but is rank 1 for '{{node triplet_loss/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false](triplet_loss/strided_slice, triplet_loss/transpose)' with input shapes: [64], [64].
From what I understand, the error is caused because when implementing dot_product=tf.matmul(anchor,tf.transpose(negative)), anchor and negative only has shape (64) so it caused the error. But should anchor and negative is of shape (batch_size,64)? I really could not understand what I did wrong. Could you please enlighten me about this? Thank you
I tried to debug by implementing an independent funct to test
def triplet_loss(y_pred):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 64)
positive -- the encodings for the positive images, of shape (None, 64)
negative -- the encodings for the negative images, of shape (None, 64)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,positive)),axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,negative)),axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.math.add(tf.math.subtract(pos_dist,neg_dist),0.4)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.math.reduce_sum(tf.math.maximum(basic_loss,0.0))
# add regularization term
print("anchor shape: ",anchor.shape)
print("neg shape: ",negative.shape)
dot_product=tf.matmul(anchor,tf.transpose(negative))
multiply_2_vectors_value=tf.linalg.diag_part(dot_product)
M1=tf.math.reduce_sum(multiply_2_vectors_value,axis=-1)
M2=tf.math.square(multiply_2_vectors_value)
mask=tf.math.maximum(tf.math.subtract(M2,1/64),0.0)
M2=tf.math.reduce_sum(M2,axis=-1)
loss+=1.1*(tf.math.square(M1)+M2)
return loss
And it works fine with dummy tensor I passed to it
dummy=tf.random.uniform((1,3,224,224,3))
re_dum=model.predict(dummy)
test=triplet_loss(re_dum)
re_dum is a list of 3 elements, each is a tensor of shape (1,64),test is a number. So this little test shows that there is no problem with my implementing. But why the error keeps showing up?
Besides, when I replace
dot_product=tf.matmul(anchor,tf.transpose(negative))
with
dot_product=tf.matmul(tf.expand_dims(anchor,axis=0),tf.transpose(tf.expand_dims(negative,axis=0)))
The error disappeared, but it seems very perplexing for me why it works.

Tensorflow 2.0 Custom loss function with multiple inputs

I am trying to optimize a model with the following two loss functions
def loss_1(pred, weights, logits):
weighted_sparse_ce = kls.SparseCategoricalCrossentropy(from_logits=True)
policy_loss = weighted_sparse_ce(pred, logits, sample_weight=advantages)
and
def loss_2(y_pred, y):
return kls.mean_squared_error(y_pred, y)
however, because TensorFlow 2 expects loss function to be of the form
def fn(y_pred, y_true):
...
I am using a work-around for loss_1 where I pack pred and weights into a single tensor before passing to loss_1 in the call to model.fit and then unpack them in loss_1. This is inelegant and nasty because pred and weights are of different data types and so this requires an additional cast, pack, un-pack and un-cast each time I call model.fit.
Furthermore, I am aware of the sample_weight argument to fit, which is kind of like the solution to this question. This might be a workable solution were it not for the fact that I am using two loss functions and I only want the sample_weight applied to one of them. Also, even if this were a solution, would it not be generalizable to other types of custom loss functions.
All that being said, my question, said concisely, is:
What is the best way to create a loss function with an arbitrary number of
arguments in TensorFlow 2?
Another thing I have tried is passing a tf.tuple but that also seems to violate TensorFlow's desires for a loss function input.
This problem can be easily solved using custom training in TF2. You need only compute your two-component loss function within a GradientTape context and then call an optimizer with the produced gradients. For example, you could create a function custom_loss which computes both losses given the arguments to each:
def custom_loss(model, loss1_args, loss2_args):
# model: tf.model.Keras
# loss1_args: arguments to loss_1, as tuple.
# loss2_args: arguments to loss_2, as tuple.
with tf.GradientTape() as tape:
l1_value = loss_1(*loss1_args)
l2_value = loss_2(*loss2_args)
loss_value = [l1_value, l2_value]
return loss_value, tape.gradient(loss_value, model.trainable_variables)
# In training loop:
loss_values, grads = custom_loss(model, loss1_args, loss2_args)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
In this way, each loss function can take an arbitrary number of eager tensors, regardless of whether they are inputs or outputs to the model. The sets of arguments to each loss function need not be disjoint as shown in this example.
To expand on Jon's answer. In case you want to still have the benefits of a Keras Model you can expand the model class and write your own custom train_step:
from tensorflow.python.keras.engine import data_adapter
# custom loss function that takes two outputs of the model
# as input parameters which would otherwise not be possible
def custom_loss(gt, x, y):
return tf.reduce_mean(x) + tf.reduce_mean(y)
class CustomModel(keras.Model):
def compile(self, optimizer, my_loss):
super().compile(optimizer)
self.my_loss = my_loss
def train_step(self, data):
data = data_adapter.expand_1d(data)
input_data, gt, sample_weight = data_adapter.unpack_x_y_sample_weight(data)
with tf.GradientTape() as tape:
y_pred = self(input_data, training=True)
loss_value = self.my_loss(gt, y_pred[0], y_pred[1])
grads = tape.gradient(loss_value, self.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
return {"loss_value": loss_value}
...
model = CustomModel(inputs=input_tensor0, outputs=[x, y])
model.compile(optimizer=tf.keras.optimizers.Adam(), my_loss=custom_loss)
In tf 1.x we have tf.nn.weighted_cross_entropy_with_logits function which allows us trade off recall and precision by adding extra positive weights for each class. In multi-label classification, it should be a (N,) tensor or numpy array. However, in tf 2.0, I haven't found similar loss functions yet, so I wrote my own loss function with extra arguments pos_w_arr.
from tensorflow.keras.backend import epsilon
def pos_w_loss(pos_w_arr):
"""
Define positive weighted loss function
"""
def fn(y_true, y_pred):
_epsilon = tf.convert_to_tensor(epsilon(), dtype=y_pred.dtype.base_dtype)
_y_pred = tf.clip_by_value(y_pred, _epsilon, 1. - _epsilon)
cost = tf.multiply(tf.multiply(y_true, tf.math.log(
_y_pred)), pos_w_arr)+tf.multiply((1-y_true), tf.math.log(1-_y_pred))
return -tf.reduce_mean(cost)
return fn
Not sure what do you mean it wouldn't work when using eager tensors or numpy array as inputs though. Please correct me if I'm wrong.

InvalidArgumentError: In[0] is not a matrix. Instead it has shape []

I'm not able to train the network using keras, getting the following error, at epoch 1, first batch:
InvalidArgumentError: In[0] is not a matrix. Instead it has shape []
[[{{node training/SGD/gradients/dense_1/MatMul_grad/MatMul}}]]
I'm trying to solve a regression problem using Keras and a custom function provided by https://github.com/farrell236/DeepPose
The network is a quite simple CNN VGG-like.
I think the problem is the loss function. In particular, I suppose that the weight initialization is the issue (take a look at the Tensorflow example: https://github.com/farrell236/DeepPose/blob/master/tensorflow/example)
That's my loss function:
def custom_loss(y_true, y_pred):
loss = SE3GeodesicLoss(np.ones((1, 6)))
tf.initializers.constant([loss])
y_pred = tf.cast(y_pred, dtype=tf.float32)
y_true = tf.cast(y_true, dtype=tf.float32)
loss = SE3GeodesicLoss(np.ones(6))
geodesic_loss = loss.geodesic_loss(y_pred, y_true)
geodesic_loss = tf.cast(geodesic_loss, dtype=tf.float32)
return geodesic_loss
What's strange is that I'm able to use this function as a metric for the training.
Further information:
What I'm trying to do is to estimate the position of an object having images as input and relative Eulerian angles and distance of the target as labels (which means 6 parameters [r_x, r_y, r_z, t_x, t_y, t_z]). I'm trying to implement this loss function in order to solve the attitude estimation problem. Other losses (means: MSE, MAE) are not effective enough in solving attitude regression problem.
Do you have any suggestion?

slicing inputs for loss function in keras with tensorflow

In Keras I have a target vector of y_true that fits onto a network that has one output neuron. y_true = [0, 1, 0, 1, 1....] and I have some payoffs [1,1,1,-5,1...]
I'm trying to put the payoffs as extra parameters into a custom loss function of keras. Keras only allows two parameters to be passed into it (y_true and y_pred), but I would also like to pass the payoffs that are assigned to each sample. To that end I have added a second column to y_true that contains those values.
I then try to separate the actual y_true (first column) and the payoffs (second column) again in the loss function by doing the following:
def custom_loss(y_true, y_pred)
# y_true has the payoffs in the second row
payoffs = y_true[:, 1]
payoffs = K.expand_dims(payoffs, 1)
y_true = y_true[:, 0]
y_true = K.expand_dims(y_true, 1))
loss = K.binary_crossentropy(y_true, y_pred)
return loss
This is a simplified version of what I want to do (in the real version I will integrate the payoffs into the loss function). But for the example above I would expect the loss function to be identical to just calling binary_cross entropy directly with having y_true only containing y_true (without any payoffs).
However, the result is not as expected as the accuracy values are around half with the custom loss function above.
What could be the cause for this error? Am I not slicing y_true correctly?
The problem is related to what is described in this post (curiale's comment on 12 Dec 2017 suggests to use slice_stack, but the problem is the same).
I think the problem was that I needed to customize the metric function as well.

Can you process a tensor in chunks in a custom Keras loss function?

I am trying to write a cusom Keras loss function in which I process the tensors in sub-vector chunks. For example, if an output tensor represented a concatenation of quaternion coefficients (i.e. w,x,y,z,w,x,y,z...) I might wish to normalize each quaternion before calculating the mean squared error in a loss function like:
def norm_quat_mse(y_true, y_pred):
diff = y_pred - y_true
dist = 0
for i in range(0,16,4):
dist += K.sum( K.square(diff[i:i+4] / K.sqrt(K.sum(K.square(diff[i:i+4])))))
return dist/4
While Keras will accept this function without error and use in training, it outputs a different loss value from when applied as an independent function and when using model.predict(), so I suspect it is not working properly. None of the built-in Keras loss functions use this per-chunk processing approach, is it possible to do this within Keras' auto-differentiation framework?
Try:
def norm_quat_mse(y_true, y_pred):
diff = y_pred - y_true
dist = 0
for i in range(0,16,4):
dist += K.sum( K.square(diff[:,i:i+4] / K.sqrt(K.sum(K.square(diff[:,i:i+4])))))
return dist/4
You need to know that shape of y_true and y_pred is (batch_size, output_size) so you need to skip first dimension during computations.