How does Keras compute its loss function for matrix-valued outputs? - tensorflow

I try to compute the next several video frames given a collection of previous frames, i.e. I have a deep neural network that directly outputs a small video clip of dimension (samples, frames, m, n, channels). I train my neural network using Keras' mean squared error loss function.
Keras' implementation of the mean squared error loss function is
K.mean(K.square(y_pred - y_true), axis=-1)
The computed loss value will in my case still be a rank 4 tensor (which I checked is indeed true).
As the loss function should be scalar I had imagined this will cause a problem but surprisingly there is no warning issued from Keras and I do get some meaningful results.
Any clue as to how Keras is doing its back-propagation in this case? Is there an internal conversion to a scalar loss function that Keras is doing that I am not aware of?
Thank you!

Related

Intersection over Union used as Metric or Loss

Im currently struggling to understand the use of the IoU. Is the IoU just a Metric to monitor the quality of a network, or is used as a loss function where the value has some impact on the backprop?
For a measure to be used as a loss function, it must be differentiable, with non-trivial gradients.
For instance, in image classification, accuracy is the most common measure of success. However, if you try to differentiate accuracy, you'll see that the gradients are zero almost everywhere and therefore one cannot train a model with accuracy as a loss function.
Similarly, IoU, in its native form, also has meaningless gradients and cannot be used as a loss function. However, extensions to IoU that preserve gradients exist and can be effectively used as a loss function for training.

An Efficient way to Calculate loss function batchwise?

I am using autoencoders to do anomaly detection. So, I have finished training my model and now I want to calculate the reconstruction loss for each entry in the dataset. so that I can assign anomalies to data points with high reconstruction loss.
This is my current code to calculate the reconstruction loss
But this is really slow. By my estimation, it should take 5 hours to go through the dataset whereas training one epoch occurs in approx 55 mins.
I feel that converting to tensor operation is bottlenecking the code, but I can't find a better way to do it.
I've tried changing the batch sizes but it does not make much of a difference. I have to use the convert to tensor part because K.eval is throwing an error if I do it normally.
python
for i in range(0, encoded_dataset.shape[0], batch_size):
y_true = tf.convert_to_tensor(encoded_dataset[i:i+batch_size].values,
np.float32)
y_pred= tf.convert_to_tensor(ae1.predict(encoded_dataset[i:i+batch_size].values),
np.float32)
# Append the batch losses (numpy array) to the list
reconstruction_loss_transaction.append(K.eval(loss_function( y_true, y_pred)))
I was able to train in 55 mins per epoch. So I feel prediction should not take 5 hours per epoch. encoded_dataset is a variable that has the entire dataset in main memory as a data frame.
I am using Azure VM instance.
K.eval(loss_function(y_true,y_pred) is to find the loss for each row of the batch
So y_true will be of size (batch_size,2000) and so will y_pred
K.eval(loss_function(y_true,y_pred) will give me an output of
(batch_size,1) evaluating binary cross entropy on each row of y
_true and y_pred
Moved from comments:
My suspicion is that ae1.predict and K.eval(loss_function) are behaving in unexpected ways. ae1.predict should normally be used to output the loss function value as well as y_pred. When you create the model, specify that the loss value is another output (you can have a list of multiple outputs), then just call predict here once to get both y_pred the loss value in one call.
But I want the loss for each row . Won't the loss returned by the predict method be the mean loss for the entire batch?
The answer depends on how the loss function is implemented. Both ways produce perfectly valid and identical results in TF under the hood. You could average the loss over the batch before taking the gradient w.r.t. the loss, or take the gradient w.r.t. a vector of losses. The gradient operation in TF will perform the averaging of the losses for you if you use the latter approach (see SO articles on taking the per-sample gradient, it's actually hard to do).
If Keras implements the loss with reduce_mean built into the loss, you could just define your own loss. If you're using square loss, replacing 'mean_squared_error' with lambda y_true, y_pred: tf.square(y_pred - y_true). That would produce square error instead of MSE (no difference to the gradient), but look here for the variant including the mean.
In any case this produces a per sample loss so long as you don't use tf.reduce_mean, which is purely optional in the loss. Another option is to simply compute the loss separately from what you optimize for and make that an output of the model, also perfectly valid.

How to use tensorflow pairwise loss

In tensorflow, there is a pairwise mean squared error function which takes in "predictions" it is not documented if this should be a sigmoid/softmax output or logits. https://www.tensorflow.org/api_docs/python/tf/losses/mean_pairwise_squared_error
I am looking to see if predictions must be a certain form for the input, or if there is a better pairwise loss function available.
The logits layer, in deep learning context is the layer on which the softmax function is applied. The softmax function is applied when want to perform multi-class classification. When we want to perform classification, the most common error measure is cross-entropy. On the other hand, the mean pairwise squared error is used in the context of regression. When we perform regression, we want to predict a real value as opposed to classification when we want to predict a class. With that said, the layer that will generate the outputs won't be a logits layer, but an ordinary linear layer. Moreover, the most common error measure when you want to perform regression is mean squared erorr.

Get values of tensors in loss function

I would like to get the values of the y_pred and y_true tensors of this keras backend function. I need this to be able to perform some custom calculations and change the loss, these calculations are just possible with the real array values.
def mean_squared_error(y_true, y_pred):
#some code here
return K.mean(K.square(y_pred - y_true), axis=-1)
There is a way to do this in keras? Or in any other ML framework (tf, pytorch, theano)?
No, in general you can't compute the loss that way, because Keras is based on frameworks that do automatic differentiation (like Theano, TensorFlow) and they need to know which operations you are doing in between in order to compute the gradients of the loss.
You need to implement your loss computations using keras.backend functions, else there is no way to compute gradients and optimization won't be possible.
Try including this within the loss function:
y_true = keras.backend.print_tensor(y_true, message='y_true')
Following is an excerpt from the Keras documentation (https://keras.io/backend/):
print_tensor
keras.backend.print_tensor(x, message='')
Prints message and the tensor value when evaluated.
Note that print_tensor returns a new tensor identical to x which should be used in the later parts of the code. Otherwise, the print operation is not taken into account during evaluation.

Convolutional Neural Network Loss

While Calculating the Loss Function. Can i manually Calculate Loss like
Loss = tf.reduce_mean(tf.square(np.array(Prediction) - np.array(Y)))
and then Optimize this Loss using Adam Optimizer
No.
Tensorflow loss functions typically accept tensors as input and also outputs a tensor. So np.array() wouldn't work.
In case of CNNs, you'd generally come across loss functions like cross-entropy, softmax corss-entropy, sigmoid cross-entropy etc. These are already in-built in tf.losses module. So you can use them directly.
The loss function that you're trying to apply looks like a Mean-squared loss. This is built in tf.losses as well. tf.losses.mean_squared_error.
Having said that, I've also implemented a few loss functions like cross-entropy using hand-coded formula such as: -tf.reduce_mean(tf.reduce_sum(targets * logProb)). This works equally fine, as long as the inputs targets and logProb are computed as tensors and not as numpy arrays.
No, actually you need to use tensor Variable for Loss, not use numpy.array(np.array(Prediction)).
Since tensorflow will eval these tensors in tensorflow engine.