Is it scaled twice in keras code categorical_crossentropy? - tensorflow

I see categorical_crossentropy is implemented in Keras as follows:
def categorical_crossentropy(target, output, from_logits=False, axis=-1):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
target: A tensor of the same shape as `output`.
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
axis: Int specifying the channels axis. `axis=-1`
corresponds to data format `channels_last`,
and `axis=1` corresponds to data format
`channels_first`.
# Returns
Output tensor.
# Raises
ValueError: if `axis` is neither -1 nor one of
the axes of `output`.
"""
output_dimensions = list(range(len(output.get_shape())))
if axis != -1 and axis not in output_dimensions:
raise ValueError(
'{}{}{}'.format(
'Unexpected channels axis {}. '.format(axis),
'Expected to be -1 or one of the axes of `output`, ',
'which has {} dimensions.'.format(len(output.get_shape()))))
# Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# scale preds so that the class probas of each sample sum to 1
output /= tf.reduce_sum(output, axis, True)
# manual computation of crossentropy
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1. - _epsilon)
return - tf.reduce_sum(target * tf.log(output), axis)
I don't under stand from
output_dimensions = list(range(len(output.get_shape())))
to
output /= tf.reduce_sum(output, axis, True).
I understand Output is probabilities, a tensor resulting from a softmax -> It mean is scaled preds so that the class probas of each sample sum to 1. Why do they need to scale preds so that the probas class of each sample sum to 1 again? Please explain this.

Because you need to make sure that each probability is between 0 and 1, else the cross-entropy computation will be incorrect. Its a way to also prevent user errors when they make (unnormalized) probabilities outside that range.

Related

Problem with tensor shape when implementing a custom loss function for my model in Tensorflow

I picked up the idea of triplet loss and global orthogonal regularization from this paper http://cs230.stanford.edu/projects_fall_2019/reports/26251543.pdf. However, I keep getting caught up in an tensor shape error.
After I define modelv1 as the base model (modelv1 take input of shape (None,224,224,3) and return tensor of shape (None,64)), the complete model will be defined as follow:
input_shape=(3,224,224,3)
input_all=Input(shape=input_shape)
input_anchor=input_all[:,0,:]
input_pos=input_all[:,1,:]
input_neg=input_all[:,2,:]
output_anchor=modelv1(input_anchor)
output_pos=modelv1(input_pos)
output_neg=modelv1(input_neg)
model=Model(inputs=input_all,outputs=[output_anchor,output_pos,output_neg])
The formula for triplet loss with global orthogonal regularization, as provided in the paper I mentioned above is:
Formular for the loss function
I implemented this formular as follow:
def triplet_loss_with_margin(margin=0.4,d=64,alpha=1.1):
def triplet_loss(y_true,y_pred):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 64)
positive -- the encodings for the positive images, of shape (None, 64)
negative -- the encodings for the negative images, of shape (None, 64)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,positive)),axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,negative)),axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.math.add(tf.math.subtract(pos_dist,neg_dist),margin)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.math.reduce_sum(tf.math.maximum(basic_loss,0.0))
# add regularization term
dot_product=tf.matmul(anchor,tf.transpose(negative))
multiply_2_vectors_value=tf.linalg.diag_part(dot_product)
M1=tf.math.reduce_sum(multiply_2_vectors_value,axis=-1)
M2=tf.math.square(multiply_2_vectors_value)
M2=tf.math.maximum(tf.math.subtract(M2,1/d),0.0)
M2=tf.math.reduce_sum(M2,axis=-1)
loss+=alpha*(tf.math.square(M1)+M2)
return loss
return triplet_loss
I assumed that since anchor and negative all have shape (None,64), this approach should work. However, when I trained the model, I encoutered the error bellow
ValueError: in user code:
/opt/conda/lib/python3.7/site-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/tmp/ipykernel_24/1319124991.py:34 triplet_loss *
dot_product=tf.matmul(anchor,tf.transpose(negative))
/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper **
return target(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3655 matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py:5714 mat_mul
name=name)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:601 _create_op_internal
compute_device)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3569 _create_op_internal
op_def=op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2042 __init__
control_input_ops, op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1883 _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 2 but is rank 1 for '{{node triplet_loss/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false](triplet_loss/strided_slice, triplet_loss/transpose)' with input shapes: [64], [64].
From what I understand, the error is caused because when implementing dot_product=tf.matmul(anchor,tf.transpose(negative)), anchor and negative only has shape (64) so it caused the error. But should anchor and negative is of shape (batch_size,64)? I really could not understand what I did wrong. Could you please enlighten me about this? Thank you
I tried to debug by implementing an independent funct to test
def triplet_loss(y_pred):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 64)
positive -- the encodings for the positive images, of shape (None, 64)
negative -- the encodings for the negative images, of shape (None, 64)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,positive)),axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,negative)),axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.math.add(tf.math.subtract(pos_dist,neg_dist),0.4)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.math.reduce_sum(tf.math.maximum(basic_loss,0.0))
# add regularization term
print("anchor shape: ",anchor.shape)
print("neg shape: ",negative.shape)
dot_product=tf.matmul(anchor,tf.transpose(negative))
multiply_2_vectors_value=tf.linalg.diag_part(dot_product)
M1=tf.math.reduce_sum(multiply_2_vectors_value,axis=-1)
M2=tf.math.square(multiply_2_vectors_value)
mask=tf.math.maximum(tf.math.subtract(M2,1/64),0.0)
M2=tf.math.reduce_sum(M2,axis=-1)
loss+=1.1*(tf.math.square(M1)+M2)
return loss
And it works fine with dummy tensor I passed to it
dummy=tf.random.uniform((1,3,224,224,3))
re_dum=model.predict(dummy)
test=triplet_loss(re_dum)
re_dum is a list of 3 elements, each is a tensor of shape (1,64),test is a number. So this little test shows that there is no problem with my implementing. But why the error keeps showing up?
Besides, when I replace
dot_product=tf.matmul(anchor,tf.transpose(negative))
with
dot_product=tf.matmul(tf.expand_dims(anchor,axis=0),tf.transpose(tf.expand_dims(negative,axis=0)))
The error disappeared, but it seems very perplexing for me why it works.

Tensorflow loss function no gradient provided

Currently I try to code my own loss function, but when returning the result (a tensor that consists of a list with the loss values) I get the following error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0'].
However in tutorials and in their docs they also use tf.recude_mean and when using it like them (they showed how to code mse loss function) I dont get the error, so it seems that I am missing something
My code:
gl = tfa.losses.GIoULoss()
def loss(y_true, y_pred):
batch_size = y_true.shape[0]
# now contains 32 lists (a batch) of bbxs -> shape is (32, 7876)
bbx_true = y_true.numpy()
# now contains 32 lists (a batch) of bbxs here we have to double access [0] in order to get the entry itself
# -> shape is (32, 1, 1, 7876)
bbx_pred = y_pred.numpy()
losses = []
curr_true = []
curr_pred = []
for i in range(batch_size):
curr_true = bbx_true[i]
curr_pred = bbx_pred[i][0][0]
curr_true = [curr_true[x:x+4] for x in range(0, len(curr_true), 4)]
curr_pred = [curr_pred[x:x+4] for x in range(0, len(curr_pred), 4)]
if len(curr_true) == 0:
curr_true.append([0., 0.,0.,0.])
curr_loss = gl(curr_true, curr_pred)
losses.append(curr_loss)
return tf.math.reduce_mean(losses, axis=-1)
Basically I want to achive bounding box regression and because of that I want to use the GIoUloss loss function. Because my model outputs 7896 neurons (the max amount of bounding boxes I want to predict according to my training set times 4) and the gioloss function needs the input as an array of lists with 4 elements each, I have to perform this transformation.
How do I have to change my code in order to also build up a gradient
Numpy don't provide autograd functions so you need to have Tensorflow tensors exclusively in your loss (otherwise the gradient is lost during backpropagation). So avoid using .numpy() and use the tensorflow operators and slicing on tensoflow tensors instead.

Explanation of an implementation of the categorical_crossentropy

The formula for the categorical cross-entropy is the following.
What should the output of the last layer be? Should it be the probabilities of classes from a softmax layer?
What is the target?
How does the following code implement 1/N, the summation and pi,j?
def categorical_crossentropy(output, target, from_logits=False):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
target: A tensor of the same shape as `output`.
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
# Returns
Output tensor.
"""
# Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# scale preds so that the class probas of each sample sum to 1
output /= tf.reduce_sum(output,
reduction_indices=len(output.get_shape()) - 1,
keep_dims=True)
# manual computation of crossentropy
epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
output = tf.clip_by_value(output, epsilon, 1. - epsilon)
return - tf.reduce_sum(target * tf.log(output),
reduction_indices=len(output.get_shape()) - 1)
else:
return tf.nn.softmax_cross_entropy_with_logits(labels=target,
logits=output)
What should the output of the last layer be? Should it be the probabilities of classes from a softmax layer?
It can be either the output of the softmax layer or the raw logits (input to the softmax layer). The output vector of the softmax layer are the probabilities of each class. If output is the output of softmax then set from_logits=False. If output are the logits then you want to set from_logits=True. You can see internally that tf.nn.softmax_cross_entropy_with_logits is called, which computes the softmax probabilities and the cross-entropy function at the same time. Computing them together allows for some math tricks for numerical stability.
What is the target?
The target is a one-hot vector. This means that a number n is represented by a vector v where v[n] = 1 and 0 everywhere else. Here n is the class of the label. There is a function to get this encoding in TensoFlow called tf.one_hot. For example tf.one_hot([3],5) would result in the vector [0, 0, 1, 0, 0].
How does the following code implement 1/N, the summation and pi,j?
The code above does not average over all the inputs (no need for the "1/N"). For example, if the input is shaped [10, 5] the output would be shaped [10]. You would have to call tf.reduce_mean on the result. So the equation is essentially:
The above equation is implemented in the line
return - tf.reduce_sum(target * tf.log(output),
reduction_indices=len(output.get_shape()) - 1)
The "Σ" is tf.reduce_sum. "pi,j" is output, the indicator function (i.e. the bolded 1) is the one-hot encoded target.
Side Note
You should use the tf.softmax_cross_entropy_with_logits_v2, because the code you provided (when setting from_logits=False) could result in numerical errors. The combined function takes care of all of those numerical issues.

What is the Tensorflow loss equivalent of "Binary Cross Entropy"?

I'm trying to rewrite a Keras graph into a Tensorflow graph, but wonder which loss function is the equivalent of "Binary Cross Entropy". Is it tf.nn.softmax_cross_entropy_with_logits_v2?
Thanks a lot!
No, the implementation of the binary_crossentropy with tensorflow backend is defined here as
#tf_export('keras.backend.binary_crossentropy')
def binary_crossentropy(target, output, from_logits=False):
"""Binary crossentropy between an output tensor and a target tensor.
Arguments:
target: A tensor with the same shape as `output`.
output: A tensor.
from_logits: Whether `output` is expected to be a logits tensor.
By default, we consider that `output`
encodes a probability distribution.
Returns:
A tensor.
"""
# Note: nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
epsilon_ = _to_tensor(epsilon(), output.dtype.base_dtype)
output = clip_ops.clip_by_value(output, epsilon_, 1 - epsilon_)
output = math_ops.log(output / (1 - output))
return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
Therefore, it uses sigmoid_crossentropy and not softmax_crossentropy.

Keras - custom loss function - chamfer distance

I am attempting object segmentation using a custom loss function as defined below:
def chamfer_loss_value(y_true, y_pred):
# flatten the batch
y_true_f = K.batch_flatten(y_true)
y_pred_f = K.batch_flatten(y_pred)
# ==========
# get chamfer distance sum
// error here
y_pred_mask_f = K.cast(K.greater_equal(y_pred_f,0.5), dtype='float32')
finalChamferDistanceSum = K.sum(y_pred_mask_f * y_true_f, axis=1, keepdims=True)
return K.mean(finalChamferDistanceSum)
def chamfer_loss(y_true, y_pred):
return chamfer_loss_value(y_true, y_pred)
y_pred_f is the result of my U-net. y_true_f is the result of a euclidean distance transform on the ground truth label mask x as shown below:
distTrans = ndimage.distance_transform_edt(1 - x)
To compute the Chamfer distance, you multiply the predicted image (ideally, a mask with 1 and 0) with the ground truth distance transform, and simply sum over all pixels. To do this, I needed to get a mask y_pred_mask_f by thresholding y_pred_f, then multiply with y_true_f, and sum over all pixels.
y_pred_f provides a continuous range of values in [0,1], and I get the error None type not supported at the evaluation of y_true_mask_f. I know the loss function has to be differentiable, and greater_equal and cast are not. But, is there a way to circumvent this in Keras? Perhaps using some workaround in Tensorflow?
Well, this was tricky. The reason behind your error is that there is no continuous dependence between your loss and your network. In order to compute gradients of your loss w.r.t. to network, your loss must compute the gradient of indicator if your output is greater than 0.5 (as this is the only connection between your final loss value and output y_pred from your network). This is impossible as this indicator is partially constant and not continuous.
Possible solution - smooth your indicator:
def chamfer_loss_value(y_true, y_pred):
# flatten the batch
y_true_f = K.batch_flatten(y_true)
y_pred_f = K.batch_flatten(y_pred)
y_pred_mask_f = K.sigmoid(y_pred_f - 0.5)
finalChamferDistanceSum = K.sum(y_pred_mask_f * y_true_f, axis=1, keepdims=True)
return K.mean(finalChamferDistanceSum)
As sigmoid is a continuous version of a step function. If your output comes from sigmoid - you could simply use y_pred_f instead of y_pred_mask_f.