CNTK Loss and Error Metric function for multi label classification - cntk

Other than squared_error what other loss function / error function would I be able to use ?
I looked through https://cntk.ai/pythondocs/cntk.losses.html
and wasn't able to find anything that helps.
i found documentation for brain script but not in python
any help would be amazing :)

The best source of documentation (IMHO) is python documentation. If you need to write your own loss function I found this post very helpful. Try using sigmoid function at the output layer and binary cross entropy loss or cosine loss.
target = cntk.input_variable(input_dim)
loss = cntk.binary_cross_entropy(z, target)
This way your nodes will output probabilities independent of each other like [0.73, 0.02, 0.05, 0.26, 0.68].

For multi-class classification, we typically use cross_entropy_with_softmax.
You are trying to attribute 2 or more class to every sample, then there's no native implementation in cntk

Related

Tensorflow initial_weights what mean?

Why we have to init weight in model predict? I can't understand.
You can refer : https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#checkpoint_the_initial_weights
initial_weights = os.path.join(tempfile.mkdtemp(), 'initial_weights')
model.save_weights(initial_weights)
This tutorial appears to be referring to unbalanced data. You do not need to provide initial weights if you don't want to in Tensorflow's predict command. See this link describing potential inputs to the command.
Deep learning using Gradient Descent and its variant to find optimal weights. If you don't init weights, it may take a long time to converge or even can't converge.

Loss function variational Autoencoder in Tensorflow example

I have a question regarding the loss function in variational autoencoder. I followed the tensorflow example https://www.tensorflow.org/tutorials/generative/cvae to create a LSTM-VAE, for sampling a sinus function.
My encoder-input is a set of points (x_i,sin(x_i)) for a specific range (randomly sampled), and as output of the decoder I expect similar values.
In the tensorflow guide, there is cross-entropy used to compare the encoder input with the decoder output.
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
This makes sense, because the input and output are treated as probabilities. But in reality these probabily functions represent the sets of my sinus function.
Can't I simply use a mean-squared-error instead of the cross-entropy (I tried it and it works well) or causes this a wrong behaviour of the architecture at some point?
Best regards and thanks for your help!
Well, such questions happen when you work too much and stop thinking properly. For the sake of solving this, it makes sense to think about what I'm trying to do.
p(x|z) is the decoder reconstruction, what means, that by sampling from z the value x is generated with the probability of p. In the tensorflow-example image-classification/generation is used, in that case crossentropy makes sense. I simply want to minimize the distance between my input and output. The use of mse is kind of logical.
Hope that helps someone at some point.
Regards.

Where can I find what type of loss metric a model uses?

I am trying to figure out what loss functions are used in ssdlite_mobilenet_v2_coco but I can't seem to find where it is specified in the repo or online. This is my Tensorboard for my loss while training but it doesn't specify what the loss metric is.
Link to repo: https://github.com/tensorflow/models/tree/master/research/object_detection
I am following this tutorial:
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10#8-use-your-newly-trained-object-detection-classifier
I've tried looking through the config file and some parts of the code but didn't see what type of metric it uses.
Is there a location where I can find the loss metric used for each type of model architecture on the repo? Thanks!
You can see the loss functions used in the config file. The loss is comprised of two losses: weighted_sigmoid for classification, weighted_smooth_l1 for localization, each with weight of 1. To see the code of these losses you can check the loss functions here. Note the total loss has another component for regularization.

How to wrap a custom TensorFlow loss function in Keras?

This is my third attempt to get a deep learning project off the ground. I'm working with protein sequences. First I tried TFLearn, then raw TensorFlow, and now I'm trying Keras.
The previous two attempts taught me a lot, and gave me some code and concepts that I can re-use. However there has always been an obstacle, and I've asked questions that the developers can't answer (in the case of TFLearn), or I've simply gotten bogged down (TensorFlow object introspection is tedious).
I have written this TensorFlow loss function, and I know it works:
def l2_angle_distance(pred, tgt):
with tf.name_scope("L2AngleDistance"):
# Scaling factor
count = tgt[...,0,0]
scale = tf.to_float(tf.count_nonzero(tf.is_finite(count)))
# Mask NaN in tgt
tgt = tf.where(tf.is_nan(tgt), pred, tgt)
# Calculate L1 losses
losses = tf.losses.cosine_distance(pred, tgt, -1, reduction=tf.losses.Reduction.NONE)
# Square the losses, then sum, to get L2 scalar loss.
# Divide the loss result by the scaling factor.
return tf.reduce_sum(losses * losses) / scale
My target values (tgt) can include NaN, because my protein sequences are passed in a 4D Tensor, despite the fact that the individual sequences differ in length. Before you ask, the data can't be resampled like an image. So I use NaN in the tgt Tensor to indicate "no prediction needed here." Before I calculate the L2 cosine loss, I replace every NaN with the matching values in the prediction (pred) so the loss for every NaN is always zero.
Now, how can I re-use this function in Keras? It appears that the Keras Lambda core layer is not a good choice, because a Lambda only takes a single argument, and a loss function needs two arguments.
Alternately, can I rewrite this function in Keras? I shouldn't ever need to use the Theano or CNTK backend, so it isn't necessary for me to rewrite my function in Keras. I'll use whatever works.
I just looked at the Keras losses.py file to get some clues. I imported keras.backend and had a look around. I also found https://keras.io/backend/. I don't seem to find wrappers for ANY of the TensorFlow function calls I happen to use: to_float(), count_nonzero(), is_finite(), where(), is_nan(), cosine_distance(), or reduce_sum().
Thanks for your suggestions!
I answered my own question. I'm posting the solution for anyone who may come across this same problem.
I tried using my TF loss function directly in Keras, as was independently suggested by Matias Valdenegro. I did not provoke any errors from Keras by doing so, however, the loss value went immediately to NaN.
Eventually I identified the problem. The calling convention for a Keras loss function is first y_true (which I called tgt), then y_pred (my pred). But the calling convention for a TensorFlow loss function is pred first, then tgt. So if you want to keep a Tensorflow-native version of the loss function around, this fix works:
def keras_l2_angle_distance(tgt, pred):
return l2_angle_distance(pred, tgt)
<snip>
model.compile(loss = keras_l2_angle_distance, optimizer = "something")
Maybe Theano or CNTK uses the same parameter order as Keras, I don't know. But I'm back in business.
You don't need to use keras.backend, as your loss is directly written in TensorFlow, then you can use it directly in Keras. The backend functions are an abstraction layer so you can code a loss/layer that will work with the multiple available backends in Keras.
You just have to put your loss in the model.compile call:
model.compile(loss = l2_angle_distance, optimizer = "something")

What is the average log-perplexity in seq2seq modules in tensorflow?

Output of the following tensorflow function should give average log perplexity. I went through the source code. But I don't understand how they calculate that loss.
tf.contrib.legacy_seq2seq.sequence_loss(logits, targets, weights, average_across_timesteps=True, average_across_batch=True, softmax_loss_function=None, name=None)
I went through the tensorflow implementation. Through the perplexity has some broad meaning here in this function perplexity means
two to the power of your total cross entropy loss.
Please refer the first answer of this question.